Value-Evolutionary-Based Reinforcement Learning

Pengyi Li; Jianye Hao; Hongyao Tang; Yan Zheng; Fazl Barez

Value-Evolutionary-Based Reinforcement Learning

Pengyi Li, Jianye Hao, Hongyao Tang, Yan Zheng, Fazl Barez

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:27875-27889, 2024.

Abstract

Combining Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for policy search has been proven to improve RL performance. However, previous works largely overlook value-based RL in favor of merging EAs with policy-based RL. This paper introduces Value-Evolutionary-Based Reinforcement Learning (VEB-RL) that focuses on the integration of EAs with value-based RL. The framework maintains a population of value functions instead of policies and leverages negative Temporal Difference error as the fitness metric for evolution. The metric is more sample-efficient for population evaluation than cumulative rewards and is closely associated with the accuracy of the value function approximation. Additionally, VEB-RL enables elites of the population to interact with the environment to offer high-quality samples for RL optimization, whereas the RL value function participates in the population’s evolution in each generation. Experiments on MinAtar and Atari demonstrate the superiority of VEB-RL in significantly improving DQN, Rainbow, and SPR. Our code is available on https://github.com/yeshenpy/VEB-RL.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-li24z,
  title = 	 {Value-Evolutionary-Based Reinforcement Learning},
  author =       {Li, Pengyi and Hao, Jianye and Tang, Hongyao and Zheng, Yan and Barez, Fazl},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {27875--27889},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24z/li24z.pdf},
  url = 	 {https://proceedings.mlr.press/v235/li24z.html},
  abstract = 	 {Combining Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for policy search has been proven to improve RL performance. However, previous works largely overlook value-based RL in favor of merging EAs with policy-based RL. This paper introduces Value-Evolutionary-Based Reinforcement Learning (VEB-RL) that focuses on the integration of EAs with value-based RL. The framework maintains a population of value functions instead of policies and leverages negative Temporal Difference error as the fitness metric for evolution. The metric is more sample-efficient for population evaluation than cumulative rewards and is closely associated with the accuracy of the value function approximation. Additionally, VEB-RL enables elites of the population to interact with the environment to offer high-quality samples for RL optimization, whereas the RL value function participates in the population’s evolution in each generation. Experiments on MinAtar and Atari demonstrate the superiority of VEB-RL in significantly improving DQN, Rainbow, and SPR. Our code is available on https://github.com/yeshenpy/VEB-RL.}
}

Endnote

%0 Conference Paper
%T Value-Evolutionary-Based Reinforcement Learning
%A Pengyi Li
%A Jianye Hao
%A Hongyao Tang
%A Yan Zheng
%A Fazl Barez
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-li24z
%I PMLR
%P 27875--27889
%U https://proceedings.mlr.press/v235/li24z.html
%V 235
%X Combining Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for policy search has been proven to improve RL performance. However, previous works largely overlook value-based RL in favor of merging EAs with policy-based RL. This paper introduces Value-Evolutionary-Based Reinforcement Learning (VEB-RL) that focuses on the integration of EAs with value-based RL. The framework maintains a population of value functions instead of policies and leverages negative Temporal Difference error as the fitness metric for evolution. The metric is more sample-efficient for population evaluation than cumulative rewards and is closely associated with the accuracy of the value function approximation. Additionally, VEB-RL enables elites of the population to interact with the environment to offer high-quality samples for RL optimization, whereas the RL value function participates in the population’s evolution in each generation. Experiments on MinAtar and Atari demonstrate the superiority of VEB-RL in significantly improving DQN, Rainbow, and SPR. Our code is available on https://github.com/yeshenpy/VEB-RL.

APA


Li, P., Hao, J., Tang, H., Zheng, Y. & Barez, F.. (2024). Value-Evolutionary-Based Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:27875-27889 Available from https://proceedings.mlr.press/v235/li24z.html.

Value-Evolutionary-Based Reinforcement Learning

Abstract

Cite this Paper

Related Material