Active Coverage for PAC Reinforcement Learning

Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:5044-5109, 2023.

Abstract

Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of “good coverage” really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of \emph{active coverage} in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an \emph{instance-dependent} lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are “easy to explore”, is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-al-marjani23a, title = {Active Coverage for PAC Reinforcement Learning}, author = {Al-Marjani, Aymen and Tirinzoni, Andrea and Kaufmann, Emilie}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {5044--5109}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/al-marjani23a/al-marjani23a.pdf}, url = {https://proceedings.mlr.press/v195/al-marjani23a.html}, abstract = { Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of “good coverage” really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of \emph{active coverage} in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an \emph{instance-dependent} lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are “easy to explore”, is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values. } }
Endnote
%0 Conference Paper %T Active Coverage for PAC Reinforcement Learning %A Aymen Al-Marjani %A Andrea Tirinzoni %A Emilie Kaufmann %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-al-marjani23a %I PMLR %P 5044--5109 %U https://proceedings.mlr.press/v195/al-marjani23a.html %V 195 %X Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of “good coverage” really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of \emph{active coverage} in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an \emph{instance-dependent} lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are “easy to explore”, is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values.
APA
Al-Marjani, A., Tirinzoni, A. & Kaufmann, E.. (2023). Active Coverage for PAC Reinforcement Learning. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:5044-5109 Available from https://proceedings.mlr.press/v195/al-marjani23a.html.

Related Material