Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning

Tadashi Kozuno, Eiji Uchibe, Kenji Doya
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2995-3003, 2019.

Abstract

In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. Our analysis shows that algorithms using a combination of gap-increasing and max operators are resilient to stochastic errors, but not to non-stochastic errors. In contrast, algorithms using a softmax operator without a gap-increasing operator are less susceptible to all types of errors, but may display poor asymptotic performance. Algorithms using a combination of gap-increasing and softmax operators are much more effective and may asymptotically outperform algorithms with the max operator. Not only do these theoretical results provide a deep understanding of various reinforcement learning algorithms, but they also highlight the effectiveness of gap-increasing operators, as well as the limitations of traditional greedy value updates by the max operator.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-kozuno19a, title = {Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning}, author = {Kozuno, Tadashi and Uchibe, Eiji and Doya, Kenji}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {2995--3003}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/kozuno19a/kozuno19a.pdf}, url = {https://proceedings.mlr.press/v89/kozuno19a.html}, abstract = {In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. Our analysis shows that algorithms using a combination of gap-increasing and max operators are resilient to stochastic errors, but not to non-stochastic errors. In contrast, algorithms using a softmax operator without a gap-increasing operator are less susceptible to all types of errors, but may display poor asymptotic performance. Algorithms using a combination of gap-increasing and softmax operators are much more effective and may asymptotically outperform algorithms with the max operator. Not only do these theoretical results provide a deep understanding of various reinforcement learning algorithms, but they also highlight the effectiveness of gap-increasing operators, as well as the limitations of traditional greedy value updates by the max operator.} }
Endnote
%0 Conference Paper %T Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning %A Tadashi Kozuno %A Eiji Uchibe %A Kenji Doya %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-kozuno19a %I PMLR %P 2995--3003 %U https://proceedings.mlr.press/v89/kozuno19a.html %V 89 %X In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. Our analysis shows that algorithms using a combination of gap-increasing and max operators are resilient to stochastic errors, but not to non-stochastic errors. In contrast, algorithms using a softmax operator without a gap-increasing operator are less susceptible to all types of errors, but may display poor asymptotic performance. Algorithms using a combination of gap-increasing and softmax operators are much more effective and may asymptotically outperform algorithms with the max operator. Not only do these theoretical results provide a deep understanding of various reinforcement learning algorithms, but they also highlight the effectiveness of gap-increasing operators, as well as the limitations of traditional greedy value updates by the max operator.
APA
Kozuno, T., Uchibe, E. & Doya, K.. (2019). Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2995-3003 Available from https://proceedings.mlr.press/v89/kozuno19a.html.

Related Material