Online decision making with history-average dependent costs

Vijeth Hebbar; Cedric Langbort

Online decision making with history-average dependent costs

Vijeth Hebbar, Cedric Langbort

Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:480-491, 2024.

Abstract

In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

Cite this Paper

BibTeX


@InProceedings{pmlr-v242-hebbar24a,
  title = 	 {Online decision making with history-average dependent costs},
  author =       {Hebbar, Vijeth and Langbort, Cedric},
  booktitle = 	 {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  pages = 	 {480--491},
  year = 	 {2024},
  editor = 	 {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis},
  volume = 	 {242},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v242/hebbar24a/hebbar24a.pdf},
  url = 	 {https://proceedings.mlr.press/v242/hebbar24a.html},
  abstract = 	 {In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.}
}

Endnote

%0 Conference Paper
%T Online decision making with history-average dependent costs
%A Vijeth Hebbar
%A Cedric Langbort
%B Proceedings of the 6th Annual Learning for Dynamics & Control Conference
%C Proceedings of Machine Learning Research
%D 2024
%E Alessandro Abate
%E Mark Cannon
%E Kostas Margellos
%E Antonis Papachristodoulou	
%F pmlr-v242-hebbar24a
%I PMLR
%P 480--491
%U https://proceedings.mlr.press/v242/hebbar24a.html
%V 242
%X In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

APA


Hebbar, V. & Langbort, C.. (2024). Online decision making with history-average dependent costs. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:480-491 Available from https://proceedings.mlr.press/v242/hebbar24a.html.

Related Material

Download PDF