Variance Control for Distributional Reinforcement Learning

Qi Kuang, Zhoufan Zhu, Liwen Zhang, Fan Zhou
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:17874-17895, 2023.

Abstract

Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator Quantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-kuang23a, title = {Variance Control for Distributional Reinforcement Learning}, author = {Kuang, Qi and Zhu, Zhoufan and Zhang, Liwen and Zhou, Fan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {17874--17895}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/kuang23a/kuang23a.pdf}, url = {https://proceedings.mlr.press/v202/kuang23a.html}, abstract = {Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator Quantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.} }
Endnote
%0 Conference Paper %T Variance Control for Distributional Reinforcement Learning %A Qi Kuang %A Zhoufan Zhu %A Liwen Zhang %A Fan Zhou %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-kuang23a %I PMLR %P 17874--17895 %U https://proceedings.mlr.press/v202/kuang23a.html %V 202 %X Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator Quantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.
APA
Kuang, Q., Zhu, Z., Zhang, L. & Zhou, F.. (2023). Variance Control for Distributional Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:17874-17895 Available from https://proceedings.mlr.press/v202/kuang23a.html.

Related Material