PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient

Kaixin Wang, Daquan Zhou, Jiashi Feng, Shie Mannor
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36694-36713, 2023.

Abstract

In model-free reinforcement learning, recent methods based on a phasic policy gradient (PPG) framework have shown impressive improvements in sample efficiency and zero-shot generalization on the challenging Procgen benchmark. In PPG, two design choices are believed to be the key contributing factors to its superior performance over PPO: the high level of value sample reuse and the low frequency of feature distillation. However, through an extensive empirical study, we unveil that policy regularization and data diversity are what actually matters. In particular, we can achieve the same level of performance with low value sample reuse and frequent feature distillation, as long as the policy regularization strength and data diversity are preserved. In addition, we can maintain the high performance of PPG while reducing the computational cost to a similar level as PPO. Our comprehensive study covers all 16 Procgen games in both sample efficiency and generalization setups. We hope it can advance the understanding of PPG and provide insights for future works.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-wang23aw, title = {{PPG} Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient}, author = {Wang, Kaixin and Zhou, Daquan and Feng, Jiashi and Mannor, Shie}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {36694--36713}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/wang23aw/wang23aw.pdf}, url = {https://proceedings.mlr.press/v202/wang23aw.html}, abstract = {In model-free reinforcement learning, recent methods based on a phasic policy gradient (PPG) framework have shown impressive improvements in sample efficiency and zero-shot generalization on the challenging Procgen benchmark. In PPG, two design choices are believed to be the key contributing factors to its superior performance over PPO: the high level of value sample reuse and the low frequency of feature distillation. However, through an extensive empirical study, we unveil that policy regularization and data diversity are what actually matters. In particular, we can achieve the same level of performance with low value sample reuse and frequent feature distillation, as long as the policy regularization strength and data diversity are preserved. In addition, we can maintain the high performance of PPG while reducing the computational cost to a similar level as PPO. Our comprehensive study covers all 16 Procgen games in both sample efficiency and generalization setups. We hope it can advance the understanding of PPG and provide insights for future works.} }
Endnote
%0 Conference Paper %T PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient %A Kaixin Wang %A Daquan Zhou %A Jiashi Feng %A Shie Mannor %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-wang23aw %I PMLR %P 36694--36713 %U https://proceedings.mlr.press/v202/wang23aw.html %V 202 %X In model-free reinforcement learning, recent methods based on a phasic policy gradient (PPG) framework have shown impressive improvements in sample efficiency and zero-shot generalization on the challenging Procgen benchmark. In PPG, two design choices are believed to be the key contributing factors to its superior performance over PPO: the high level of value sample reuse and the low frequency of feature distillation. However, through an extensive empirical study, we unveil that policy regularization and data diversity are what actually matters. In particular, we can achieve the same level of performance with low value sample reuse and frequent feature distillation, as long as the policy regularization strength and data diversity are preserved. In addition, we can maintain the high performance of PPG while reducing the computational cost to a similar level as PPO. Our comprehensive study covers all 16 Procgen games in both sample efficiency and generalization setups. We hope it can advance the understanding of PPG and provide insights for future works.
APA
Wang, K., Zhou, D., Feng, J. & Mannor, S.. (2023). PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:36694-36713 Available from https://proceedings.mlr.press/v202/wang23aw.html.

Related Material