Partially Observable Reinforcement Learning with Memory Traces

Onno Eberhard, Michael Muehlebach, Claire Vernade
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:14934-14949, 2025.

Abstract

Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce memory traces. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the return errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-eberhard25a, title = {Partially Observable Reinforcement Learning with Memory Traces}, author = {Eberhard, Onno and Muehlebach, Michael and Vernade, Claire}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {14934--14949}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/eberhard25a/eberhard25a.pdf}, url = {https://proceedings.mlr.press/v267/eberhard25a.html}, abstract = {Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce memory traces. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the return errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.} }
Endnote
%0 Conference Paper %T Partially Observable Reinforcement Learning with Memory Traces %A Onno Eberhard %A Michael Muehlebach %A Claire Vernade %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-eberhard25a %I PMLR %P 14934--14949 %U https://proceedings.mlr.press/v267/eberhard25a.html %V 267 %X Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce memory traces. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the return errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.
APA
Eberhard, O., Muehlebach, M. & Vernade, C.. (2025). Partially Observable Reinforcement Learning with Memory Traces. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:14934-14949 Available from https://proceedings.mlr.press/v267/eberhard25a.html.

Related Material