Stochastic bandits with arm-dependent delays

Manegueu Anne Gael; Claire Vernade; Alexandra Carpentier; Michal Valko

Stochastic bandits with arm-dependent delays

Manegueu Anne Gael, Claire Vernade, Alexandra Carpentier, Michal Valko

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3348-3356, 2020.

Abstract

Significant work has been recently dedicated to the stochastic delayed bandits because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay. In particular, we cover the important case where the delay distributions vary across arms, and the case where the delays are heavy-tailed. Addressing these difficulties, we propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problemsdependent and problems-independent bounds on the regret as well as performance lower bounds.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-gael20a,
  title = 	 {Stochastic bandits with arm-dependent delays},
  author =       {Gael, Manegueu Anne and Vernade, Claire and Carpentier, Alexandra and Valko, Michal},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {3348--3356},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/gael20a/gael20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/gael20a.html},
  abstract = 	 {Significant work has been recently dedicated to the stochastic delayed bandits because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay. In particular, we cover the important case where the delay distributions vary across arms, and the case where the delays are heavy-tailed. Addressing these difficulties, we propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problemsdependent and problems-independent bounds on the regret as well as performance lower bounds.}
}

Endnote

%0 Conference Paper
%T Stochastic bandits with arm-dependent delays
%A Manegueu Anne Gael
%A Claire Vernade
%A Alexandra Carpentier
%A Michal Valko
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-gael20a
%I PMLR
%P 3348--3356
%U https://proceedings.mlr.press/v119/gael20a.html
%V 119
%X Significant work has been recently dedicated to the stochastic delayed bandits because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay. In particular, we cover the important case where the delay distributions vary across arms, and the case where the delays are heavy-tailed. Addressing these difficulties, we propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problemsdependent and problems-independent bounds on the regret as well as performance lower bounds.

APA


Gael, M.A., Vernade, C., Carpentier, A. & Valko, M.. (2020). Stochastic bandits with arm-dependent delays. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3348-3356 Available from https://proceedings.mlr.press/v119/gael20a.html.

Stochastic bandits with arm-dependent delays

Abstract

Cite this Paper

Related Material