Inexperienced RL Agents Can’t Get It Right: Lower Bounds on Regret at Finite Sample Complexity

Maia Fraser, Vincent Létourneau
Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:327-334, 2022.

Abstract

We consider a family M of MDPs over given state and action spaces, and an agent that is sequentially confronted with tasks from M. Although stated for this stepwise change in distributions, the insight we develop is informative for continually changing distributions as well. In order to study how structure of M, viewed as a learning environment, impacts the learning efficiency of the agent, we formulate an RL analog of fat shattering dimension for MDP families and show that this implies a nontrivial lower bound on regret as long as insufficiently many steps have been taken. More precisely, for some constant c which depends on shattering d states, an inexperienced agent that has explored the learning environment for fewer than d steps will necessarily have regret above c on some MDP in the family.

Cite this Paper


BibTeX
@InProceedings{pmlr-v199-fraser22a, title = {Inexperienced RL Agents Can’t Get It Right: Lower Bounds on Regret at Finite Sample Complexity}, author = {Fraser, Maia and L\'etourneau, Vincent}, booktitle = {Proceedings of The 1st Conference on Lifelong Learning Agents}, pages = {327--334}, year = {2022}, editor = {Chandar, Sarath and Pascanu, Razvan and Precup, Doina}, volume = {199}, series = {Proceedings of Machine Learning Research}, month = {22--24 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v199/fraser22a/fraser22a.pdf}, url = {https://proceedings.mlr.press/v199/fraser22a.html}, abstract = {We consider a family $\mathcal M$ of MDPs over given state and action spaces, and an agent that is sequentially confronted with tasks from $\mathcal M$. Although stated for this stepwise change in distributions, the insight we develop is informative for continually changing distributions as well. In order to study how structure of $\mathcal M$, viewed as a learning environment, impacts the learning efficiency of the agent, we formulate an RL analog of fat shattering dimension for MDP families and show that this implies a nontrivial lower bound on regret as long as insufficiently many steps have been taken. More precisely, for some constant $c$ which depends on shattering $d$ states, an inexperienced agent that has explored the learning environment for fewer than $d$ steps will necessarily have regret above $c$ on some MDP in the family.} }
Endnote
%0 Conference Paper %T Inexperienced RL Agents Can’t Get It Right: Lower Bounds on Regret at Finite Sample Complexity %A Maia Fraser %A Vincent Létourneau %B Proceedings of The 1st Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2022 %E Sarath Chandar %E Razvan Pascanu %E Doina Precup %F pmlr-v199-fraser22a %I PMLR %P 327--334 %U https://proceedings.mlr.press/v199/fraser22a.html %V 199 %X We consider a family $\mathcal M$ of MDPs over given state and action spaces, and an agent that is sequentially confronted with tasks from $\mathcal M$. Although stated for this stepwise change in distributions, the insight we develop is informative for continually changing distributions as well. In order to study how structure of $\mathcal M$, viewed as a learning environment, impacts the learning efficiency of the agent, we formulate an RL analog of fat shattering dimension for MDP families and show that this implies a nontrivial lower bound on regret as long as insufficiently many steps have been taken. More precisely, for some constant $c$ which depends on shattering $d$ states, an inexperienced agent that has explored the learning environment for fewer than $d$ steps will necessarily have regret above $c$ on some MDP in the family.
APA
Fraser, M. & Létourneau, V.. (2022). Inexperienced RL Agents Can’t Get It Right: Lower Bounds on Regret at Finite Sample Complexity. Proceedings of The 1st Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 199:327-334 Available from https://proceedings.mlr.press/v199/fraser22a.html.

Related Material