Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning

Toshinori Kitamura, Lingwei Zhu, Takamitsu Matsubara
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:918-931, 2021.

Abstract

The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions. However, existing analyses focus on fixed regularization with a constant weighting coefficient and do not consider cases where the coefficient is allowed to change dynamically. In this paper, we study the dynamic coefficient scheme and present the first asymptotic error bound. Based on the dynamic coefficient error bound, we propose an effective scheme to tune the coefficient according to the magnitude of error in favor of more robust learning. Complementing this development, we propose a novel algorithm, Geometric Value Iteration (GVI), that features a dynamic error-aware KL coefficient design with the aim of mitigating the impact of errors on performance. Our experiments demonstrate that GVI can effectively exploit the trade-off between learning speed and robustness over uniform averaging of a constant KL coefficient. The combination of GVI and deep networks shows stable learning behavior even in the absence of a target network, where algorithms with a constant KL coefficient would greatly oscillate or even fail to converge.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-kitamura21a, title = {Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning}, author = {Kitamura, Toshinori and Zhu, Lingwei and Matsubara, Takamitsu}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {918--931}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/kitamura21a/kitamura21a.pdf}, url = {https://proceedings.mlr.press/v157/kitamura21a.html}, abstract = { The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions. However, existing analyses focus on fixed regularization with a constant weighting coefficient and do not consider cases where the coefficient is allowed to change dynamically. In this paper, we study the dynamic coefficient scheme and present the first asymptotic error bound. Based on the dynamic coefficient error bound, we propose an effective scheme to tune the coefficient according to the magnitude of error in favor of more robust learning. Complementing this development, we propose a novel algorithm, Geometric Value Iteration (GVI), that features a dynamic error-aware KL coefficient design with the aim of mitigating the impact of errors on performance. Our experiments demonstrate that GVI can effectively exploit the trade-off between learning speed and robustness over uniform averaging of a constant KL coefficient. The combination of GVI and deep networks shows stable learning behavior even in the absence of a target network, where algorithms with a constant KL coefficient would greatly oscillate or even fail to converge.} }
Endnote
%0 Conference Paper %T Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning %A Toshinori Kitamura %A Lingwei Zhu %A Takamitsu Matsubara %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-kitamura21a %I PMLR %P 918--931 %U https://proceedings.mlr.press/v157/kitamura21a.html %V 157 %X The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions. However, existing analyses focus on fixed regularization with a constant weighting coefficient and do not consider cases where the coefficient is allowed to change dynamically. In this paper, we study the dynamic coefficient scheme and present the first asymptotic error bound. Based on the dynamic coefficient error bound, we propose an effective scheme to tune the coefficient according to the magnitude of error in favor of more robust learning. Complementing this development, we propose a novel algorithm, Geometric Value Iteration (GVI), that features a dynamic error-aware KL coefficient design with the aim of mitigating the impact of errors on performance. Our experiments demonstrate that GVI can effectively exploit the trade-off between learning speed and robustness over uniform averaging of a constant KL coefficient. The combination of GVI and deep networks shows stable learning behavior even in the absence of a target network, where algorithms with a constant KL coefficient would greatly oscillate or even fail to converge.
APA
Kitamura, T., Zhu, L. & Matsubara, T.. (2021). Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:918-931 Available from https://proceedings.mlr.press/v157/kitamura21a.html.

Related Material