Online Learning with Bounded Recall

Jon Schneider, Kiran Vodrahalli
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:43791-43803, 2024.

Abstract

We study the problem of full-information online learning in the “bounded recall” setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-bounded-recall if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses – any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-schneider24b, title = {Online Learning with Bounded Recall}, author = {Schneider, Jon and Vodrahalli, Kiran}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {43791--43803}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/schneider24b/schneider24b.pdf}, url = {https://proceedings.mlr.press/v235/schneider24b.html}, abstract = {We study the problem of full-information online learning in the “bounded recall” setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-bounded-recall if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses – any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.} }
Endnote
%0 Conference Paper %T Online Learning with Bounded Recall %A Jon Schneider %A Kiran Vodrahalli %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-schneider24b %I PMLR %P 43791--43803 %U https://proceedings.mlr.press/v235/schneider24b.html %V 235 %X We study the problem of full-information online learning in the “bounded recall” setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-bounded-recall if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses – any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.
APA
Schneider, J. & Vodrahalli, K.. (2024). Online Learning with Bounded Recall. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:43791-43803 Available from https://proceedings.mlr.press/v235/schneider24b.html.

Related Material