Online Learning from Optimal Actions

Omar Besbes; Yuri Fonseca; Ilan Lobel

Online Learning from Optimal Actions

Omar Besbes, Yuri Fonseca, Ilan Lobel

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:586-586, 2021.

Abstract

We study the problem of online contextual optimization where, at each period, instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would have taken. At each period, the decision-maker has access to a new set of feasible actions to select from and to a new contextual function that affects that period’s loss function. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle. We obtain the first regret bound for this problem that is logarithmic in the time horizon. Our results are derived through the development and analysis of a novel algorithmic structure that leverages the underlying geometry of the problem.

Cite this Paper

BibTeX

@InProceedings{pmlr-v134-besbes21a,
  title = 	 {Online Learning from Optimal Actions},
  author =       {Besbes, Omar and Fonseca, Yuri and Lobel, Ilan},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {586--586},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/besbes21a/besbes21a.pdf},
  url = 	 {https://proceedings.mlr.press/v134/besbes21a.html},
  abstract = 	 {We study the problem of online contextual optimization where, at each period, instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would have taken. At each period, the decision-maker has access to a new set of feasible actions to select from and to a new contextual function that affects that period’s loss function. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle. We obtain the first regret bound for this problem that is logarithmic in the time horizon. Our results are derived through the development and analysis of a novel algorithmic structure that leverages the underlying geometry of the problem.}
}

Endnote

%0 Conference Paper
%T Online Learning from Optimal Actions
%A Omar Besbes
%A Yuri Fonseca
%A Ilan Lobel
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-besbes21a
%I PMLR
%P 586--586
%U https://proceedings.mlr.press/v134/besbes21a.html
%V 134
%X We study the problem of online contextual optimization where, at each period, instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would have taken. At each period, the decision-maker has access to a new set of feasible actions to select from and to a new contextual function that affects that period’s loss function. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle. We obtain the first regret bound for this problem that is logarithmic in the time horizon. Our results are derived through the development and analysis of a novel algorithmic structure that leverages the underlying geometry of the problem.

APA

Besbes, O., Fonseca, Y. & Lobel, I.. (2021). Online Learning from Optimal Actions. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:586-586 Available from https://proceedings.mlr.press/v134/besbes21a.html.

Related Material

Download PDF