Best Model Identification: A Rested Bandit Formulation

Leonardo Cella; Massimiliano Pontil; Claudio Gentile

Best Model Identification: A Rested Bandit Formulation

Leonardo Cella, Massimiliano Pontil, Claudio Gentile

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1362-1372, 2021.

Abstract

We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arms are themselves learning algorithms whose expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. We complement our analysis with lower bounds, indicating strengths and limitations of the proposed solution.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-cella21a,
  title = 	 {Best Model Identification: A Rested Bandit Formulation},
  author =       {Cella, Leonardo and Pontil, Massimiliano and Gentile, Claudio},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {1362--1372},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/cella21a/cella21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/cella21a.html},
  abstract = 	 {We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arms are themselves learning algorithms whose expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. We complement our analysis with lower bounds, indicating strengths and limitations of the proposed solution.}
}

Endnote

%0 Conference Paper
%T Best Model Identification: A Rested Bandit Formulation
%A Leonardo Cella
%A Massimiliano Pontil
%A Claudio Gentile
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-cella21a
%I PMLR
%P 1362--1372
%U https://proceedings.mlr.press/v139/cella21a.html
%V 139
%X We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arms are themselves learning algorithms whose expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. We complement our analysis with lower bounds, indicating strengths and limitations of the proposed solution.

APA

Cella, L., Pontil, M. & Gentile, C.. (2021). Best Model Identification: A Rested Bandit Formulation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1362-1372 Available from https://proceedings.mlr.press/v139/cella21a.html.

Best Model Identification: A Rested Bandit Formulation

Abstract

Cite this Paper

Related Material