Provable Reinforcement Learning with a Short-Term Memory

Yonathan Efroni; Chi Jin; Akshay Krishnamurthy; Sobhan Miryoosefi

Provable Reinforcement Learning with a Short-Term Memory

Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:5832-5850, 2022.

Abstract

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-efroni22a,
  title = 	 {Provable Reinforcement Learning with a Short-Term Memory},
  author =       {Efroni, Yonathan and Jin, Chi and Krishnamurthy, Akshay and Miryoosefi, Sobhan},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {5832--5850},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/efroni22a/efroni22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/efroni22a.html},
  abstract = 	 {Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.}
}

Endnote

%0 Conference Paper
%T Provable Reinforcement Learning with a Short-Term Memory
%A Yonathan Efroni
%A Chi Jin
%A Akshay Krishnamurthy
%A Sobhan Miryoosefi
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-efroni22a
%I PMLR
%P 5832--5850
%U https://proceedings.mlr.press/v162/efroni22a.html
%V 162
%X Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.

APA


Efroni, Y., Jin, C., Krishnamurthy, A. & Miryoosefi, S.. (2022). Provable Reinforcement Learning with a Short-Term Memory. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:5832-5850 Available from https://proceedings.mlr.press/v162/efroni22a.html.

Related Material

Download PDF