Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:578-598, 2021.

Abstract

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the \emph{non-stationary case} in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of $\Omega((H^3SA/\epsilon^2)\log(1/\delta))$ on the sample complexity of an $(\varepsilon,\delta)$-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of “hard MDPs” which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the $\Omega(\sqrt{H^3SAT})$ regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-domingues21a, title = {Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited}, author = {Domingues, Omar Darwiche and M{\'e}nard, Pierre and Kaufmann, Emilie and Valko, Michal}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {578--598}, year = {2021}, editor = {Vitaly Feldman and Katrina Ligett and Sivan Sabato}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/domingues21a/domingues21a.pdf}, url = { http://proceedings.mlr.press/v132/domingues21a.html }, abstract = {In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the \emph{non-stationary case} in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of $\Omega((H^3SA/\epsilon^2)\log(1/\delta))$ on the sample complexity of an $(\varepsilon,\delta)$-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of “hard MDPs” which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the $\Omega(\sqrt{H^3SAT})$ regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.} }
Endnote
%0 Conference Paper %T Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited %A Omar Darwiche Domingues %A Pierre Ménard %A Emilie Kaufmann %A Michal Valko %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-domingues21a %I PMLR %P 578--598 %U http://proceedings.mlr.press/v132/domingues21a.html %V 132 %X In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the \emph{non-stationary case} in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of $\Omega((H^3SA/\epsilon^2)\log(1/\delta))$ on the sample complexity of an $(\varepsilon,\delta)$-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of “hard MDPs” which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the $\Omega(\sqrt{H^3SAT})$ regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.
APA
Domingues, O.D., Ménard, P., Kaufmann, E. & Valko, M.. (2021). Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:578-598 Available from http://proceedings.mlr.press/v132/domingues21a.html .

Related Material