Provable Reinforcement Learning with a Short-Term Memory

Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:5832-5850, 2022.

Abstract

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-efroni22a, title = {Provable Reinforcement Learning with a Short-Term Memory}, author = {Efroni, Yonathan and Jin, Chi and Krishnamurthy, Akshay and Miryoosefi, Sobhan}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {5832--5850}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/efroni22a/efroni22a.pdf}, url = {https://proceedings.mlr.press/v162/efroni22a.html}, abstract = {Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.} }
Endnote
%0 Conference Paper %T Provable Reinforcement Learning with a Short-Term Memory %A Yonathan Efroni %A Chi Jin %A Akshay Krishnamurthy %A Sobhan Miryoosefi %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-efroni22a %I PMLR %P 5832--5850 %U https://proceedings.mlr.press/v162/efroni22a.html %V 162 %X Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.
APA
Efroni, Y., Jin, C., Krishnamurthy, A. & Miryoosefi, S.. (2022). Provable Reinforcement Learning with a Short-Term Memory. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:5832-5850 Available from https://proceedings.mlr.press/v162/efroni22a.html.

Related Material