Competing against Adaptive Strategies in Online Learning via Hints

Aditya Bhaskara; Kamesh Munagala

Competing against Adaptive Strategies in Online Learning via Hints

Aditya Bhaskara, Kamesh Munagala

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10409-10424, 2023.

Abstract

For many of the classic online learning settings, it is known that having a “hint” about the loss function before making a prediction yields significantly better regret guarantees. In this work we study the question, do hints allow us to go beyond the standard notion of regret (which competes against the best fixed strategy) and compete against adaptive or dynamic strategies? After all, if hints were perfect, we can clearly compete against a fully dynamic strategy. For some common online learning settings, we provide upper and lower bounds for the switching regret, i.e., the difference between the loss incurred by the algorithm and the optimal strategy in hindsight that switches state at most

$L$ times, where

$L$ is some parameter. We show positive results for online linear optimization and the classic experts problem. Interestingly, such results turn out to be impossible for the classic bandit setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-bhaskara23a,
  title = 	 {Competing against Adaptive Strategies in Online Learning via Hints},
  author =       {Bhaskara, Aditya and Munagala, Kamesh},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {10409--10424},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/bhaskara23a/bhaskara23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/bhaskara23a.html},
  abstract = 	 {For many of the classic online learning settings, it is known that having a “hint” about the loss function before making a prediction yields significantly better regret guarantees. In this work we study the question, do hints allow us to go beyond the standard notion of regret (which competes against the best fixed strategy) and compete against adaptive or dynamic strategies? After all, if hints were perfect, we can clearly compete against a fully dynamic strategy. For some common online learning settings, we provide upper and lower bounds for the switching regret, i.e., the difference between the loss incurred by the algorithm and the optimal strategy in hindsight that switches state at most $L$ times, where $L$ is some parameter. We show positive results for online linear optimization and the classic experts problem. Interestingly, such results turn out to be impossible for the classic bandit setting.}
}

Endnote

%0 Conference Paper
%T Competing against Adaptive Strategies in Online Learning via Hints
%A Aditya Bhaskara
%A Kamesh Munagala
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-bhaskara23a
%I PMLR
%P 10409--10424
%U https://proceedings.mlr.press/v206/bhaskara23a.html
%V 206
%X For many of the classic online learning settings, it is known that having a “hint” about the loss function before making a prediction yields significantly better regret guarantees. In this work we study the question, do hints allow us to go beyond the standard notion of regret (which competes against the best fixed strategy) and compete against adaptive or dynamic strategies? After all, if hints were perfect, we can clearly compete against a fully dynamic strategy. For some common online learning settings, we provide upper and lower bounds for the switching regret, i.e., the difference between the loss incurred by the algorithm and the optimal strategy in hindsight that switches state at most $L$ times, where $L$ is some parameter. We show positive results for online linear optimization and the classic experts problem. Interestingly, such results turn out to be impossible for the classic bandit setting.

APA


Bhaskara, A. & Munagala, K.. (2023). Competing against Adaptive Strategies in Online Learning via Hints. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10409-10424 Available from https://proceedings.mlr.press/v206/bhaskara23a.html.

Related Material

Download PDF