Exploration Conscious Reinforcement Learning Revisited

Lior Shani, Yonathan Efroni, Shie Mannor
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5680-5689, 2019.

Abstract

The Exploration-Exploitation tradeoff arises in Reinforcement Learning when one cannot tell if a policy is optimal. Then, there is a constant need to explore new actions instead of exploiting past experience. In practice, it is common to resolve the tradeoff by using a fixed exploration mechanism, such as $\epsilon$-greedy exploration or by adding Gaussian noise, while still trying to learn an optimal policy. In this work, we take a different approach and study exploration-conscious criteria, that result in optimal policies with respect to the exploration mechanism. Solving these criteria, as we establish, amounts to solving a surrogate Markov Decision Process. We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria. Building on the approaches, we apply simple changes in existing tabular and deep Reinforcement Learning algorithms and empirically demonstrate superior performance relatively to their non-exploration-conscious counterparts, both for discrete and continuous action spaces.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-shani19a, title = {Exploration Conscious Reinforcement Learning Revisited}, author = {Shani, Lior and Efroni, Yonathan and Mannor, Shie}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5680--5689}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/shani19a/shani19a.pdf}, url = {https://proceedings.mlr.press/v97/shani19a.html}, abstract = {The Exploration-Exploitation tradeoff arises in Reinforcement Learning when one cannot tell if a policy is optimal. Then, there is a constant need to explore new actions instead of exploiting past experience. In practice, it is common to resolve the tradeoff by using a fixed exploration mechanism, such as $\epsilon$-greedy exploration or by adding Gaussian noise, while still trying to learn an optimal policy. In this work, we take a different approach and study exploration-conscious criteria, that result in optimal policies with respect to the exploration mechanism. Solving these criteria, as we establish, amounts to solving a surrogate Markov Decision Process. We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria. Building on the approaches, we apply simple changes in existing tabular and deep Reinforcement Learning algorithms and empirically demonstrate superior performance relatively to their non-exploration-conscious counterparts, both for discrete and continuous action spaces.} }
Endnote
%0 Conference Paper %T Exploration Conscious Reinforcement Learning Revisited %A Lior Shani %A Yonathan Efroni %A Shie Mannor %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-shani19a %I PMLR %P 5680--5689 %U https://proceedings.mlr.press/v97/shani19a.html %V 97 %X The Exploration-Exploitation tradeoff arises in Reinforcement Learning when one cannot tell if a policy is optimal. Then, there is a constant need to explore new actions instead of exploiting past experience. In practice, it is common to resolve the tradeoff by using a fixed exploration mechanism, such as $\epsilon$-greedy exploration or by adding Gaussian noise, while still trying to learn an optimal policy. In this work, we take a different approach and study exploration-conscious criteria, that result in optimal policies with respect to the exploration mechanism. Solving these criteria, as we establish, amounts to solving a surrogate Markov Decision Process. We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria. Building on the approaches, we apply simple changes in existing tabular and deep Reinforcement Learning algorithms and empirically demonstrate superior performance relatively to their non-exploration-conscious counterparts, both for discrete and continuous action spaces.
APA
Shani, L., Efroni, Y. & Mannor, S.. (2019). Exploration Conscious Reinforcement Learning Revisited. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5680-5689 Available from https://proceedings.mlr.press/v97/shani19a.html.

Related Material