Toward Efficient Gradient-Based Value Estimation

Arsalan Sharifnassab, Richard S. Sutton
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:30827-30849, 2023.

Abstract

Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-sharifnassab23a, title = {Toward Efficient Gradient-Based Value Estimation}, author = {Sharifnassab, Arsalan and Sutton, Richard S.}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {30827--30849}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/sharifnassab23a/sharifnassab23a.pdf}, url = {https://proceedings.mlr.press/v202/sharifnassab23a.html}, abstract = {Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.} }
Endnote
%0 Conference Paper %T Toward Efficient Gradient-Based Value Estimation %A Arsalan Sharifnassab %A Richard S. Sutton %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-sharifnassab23a %I PMLR %P 30827--30849 %U https://proceedings.mlr.press/v202/sharifnassab23a.html %V 202 %X Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.
APA
Sharifnassab, A. & Sutton, R.S.. (2023). Toward Efficient Gradient-Based Value Estimation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:30827-30849 Available from https://proceedings.mlr.press/v202/sharifnassab23a.html.

Related Material