Regularized Q-learning through Robust Averaging

Peter Schmitt-Förster, Tobias Sutter
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:43742-43764, 2024.

Abstract

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins’ Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-schmitt-forster24a, title = {Regularized Q-learning through Robust Averaging}, author = {Schmitt-F\"{o}rster, Peter and Sutter, Tobias}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {43742--43764}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/schmitt-forster24a/schmitt-forster24a.pdf}, url = {https://proceedings.mlr.press/v235/schmitt-forster24a.html}, abstract = {We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins’ Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.} }
Endnote
%0 Conference Paper %T Regularized Q-learning through Robust Averaging %A Peter Schmitt-Förster %A Tobias Sutter %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-schmitt-forster24a %I PMLR %P 43742--43764 %U https://proceedings.mlr.press/v235/schmitt-forster24a.html %V 235 %X We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins’ Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.
APA
Schmitt-Förster, P. & Sutter, T.. (2024). Regularized Q-learning through Robust Averaging. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:43742-43764 Available from https://proceedings.mlr.press/v235/schmitt-forster24a.html.

Related Material