Online decision making with history-average dependent costs

Vijeth Hebbar, Cedric Langbort
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:480-491, 2024.

Abstract

In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-hebbar24a, title = {Online decision making with history-average dependent costs}, author = {Hebbar, Vijeth and Langbort, Cedric}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {480--491}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/hebbar24a/hebbar24a.pdf}, url = {https://proceedings.mlr.press/v242/hebbar24a.html}, abstract = {In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.} }
Endnote
%0 Conference Paper %T Online decision making with history-average dependent costs %A Vijeth Hebbar %A Cedric Langbort %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-hebbar24a %I PMLR %P 480--491 %U https://proceedings.mlr.press/v242/hebbar24a.html %V 242 %X In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.
APA
Hebbar, V. & Langbort, C.. (2024). Online decision making with history-average dependent costs. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:480-491 Available from https://proceedings.mlr.press/v242/hebbar24a.html.

Related Material