VA-learning as a more efficient alternative to Q-learning

Yunhao Tang, Remi Munos, Mark Rowland, Michal Valko
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:33739-33757, 2023.

Abstract

In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function. A natural question is: Why not learn the advantage function directly? In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning both in tabular implementations and deep RL agents on Atari-57 games. We also identify a close connection between VA-learning and the dueling architecture, which partially explains why a simple architectural change to DQN agents tends to improve performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-tang23h, title = {{VA}-learning as a more efficient alternative to Q-learning}, author = {Tang, Yunhao and Munos, Remi and Rowland, Mark and Valko, Michal}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {33739--33757}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/tang23h/tang23h.pdf}, url = {https://proceedings.mlr.press/v202/tang23h.html}, abstract = {In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function. A natural question is: Why not learn the advantage function directly? In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning both in tabular implementations and deep RL agents on Atari-57 games. We also identify a close connection between VA-learning and the dueling architecture, which partially explains why a simple architectural change to DQN agents tends to improve performance.} }
Endnote
%0 Conference Paper %T VA-learning as a more efficient alternative to Q-learning %A Yunhao Tang %A Remi Munos %A Mark Rowland %A Michal Valko %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-tang23h %I PMLR %P 33739--33757 %U https://proceedings.mlr.press/v202/tang23h.html %V 202 %X In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function. A natural question is: Why not learn the advantage function directly? In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning both in tabular implementations and deep RL agents on Atari-57 games. We also identify a close connection between VA-learning and the dueling architecture, which partially explains why a simple architectural change to DQN agents tends to improve performance.
APA
Tang, Y., Munos, R., Rowland, M. & Valko, M.. (2023). VA-learning as a more efficient alternative to Q-learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:33739-33757 Available from https://proceedings.mlr.press/v202/tang23h.html.

Related Material