Nonstochastic Bandits and Experts with Arm-Dependent Delays

Dirk Van Der Hoeven; Nicolò Cesa-Bianchi

Nonstochastic Bandits and Experts with Arm-Dependent Delays

Dirk Van Der Hoeven, Nicolò Cesa-Bianchi

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:2022-2044, 2022.

Abstract

We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms. While the setting in which delays only depend on time has been extensively studied, the arm-dependent delay setting better captures real-world applications at the cost of introducing new technical challenges. In the full information (experts) setting, we design an algorithm with a first-order regret bound that reveals an interesting trade-off between delays and losses. We prove a similar first-order regret bound also for the bandit setting, when the learner is allowed to observe how many losses are missing. Our bounds are the first in the delayed setting that only depend on the losses and delays of the best arm. In the bandit setting, when no information other than the losses is observed, we still manage to prove a regret bound for bandits through a modification to the algorithm of Zimmert and Seldin (2020). Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-van-der-hoeven22a,
  title = 	 { Nonstochastic Bandits and Experts with Arm-Dependent Delays },
  author =       {Van Der Hoeven, Dirk and Cesa-Bianchi, Nicol\`o},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2022--2044},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/van-der-hoeven22a/van-der-hoeven22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/van-der-hoeven22a.html},
  abstract = 	 { We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms. While the setting in which delays only depend on time has been extensively studied, the arm-dependent delay setting better captures real-world applications at the cost of introducing new technical challenges. In the full information (experts) setting, we design an algorithm with a first-order regret bound that reveals an interesting trade-off between delays and losses. We prove a similar first-order regret bound also for the bandit setting, when the learner is allowed to observe how many losses are missing. Our bounds are the first in the delayed setting that only depend on the losses and delays of the best arm. In the bandit setting, when no information other than the losses is observed, we still manage to prove a regret bound for bandits through a modification to the algorithm of Zimmert and Seldin (2020). Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round. }
}

Endnote

%0 Conference Paper
%T  Nonstochastic Bandits and Experts with Arm-Dependent Delays 
%A Dirk Van Der Hoeven
%A Nicolò Cesa-Bianchi
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-van-der-hoeven22a
%I PMLR
%P 2022--2044
%U https://proceedings.mlr.press/v151/van-der-hoeven22a.html
%V 151
%X  We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms. While the setting in which delays only depend on time has been extensively studied, the arm-dependent delay setting better captures real-world applications at the cost of introducing new technical challenges. In the full information (experts) setting, we design an algorithm with a first-order regret bound that reveals an interesting trade-off between delays and losses. We prove a similar first-order regret bound also for the bandit setting, when the learner is allowed to observe how many losses are missing. Our bounds are the first in the delayed setting that only depend on the losses and delays of the best arm. In the bandit setting, when no information other than the losses is observed, we still manage to prove a regret bound for bandits through a modification to the algorithm of Zimmert and Seldin (2020). Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.

APA


Van Der Hoeven, D. & Cesa-Bianchi, N.. (2022).  Nonstochastic Bandits and Experts with Arm-Dependent Delays . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:2022-2044 Available from https://proceedings.mlr.press/v151/van-der-hoeven22a.html.

Related Material

Download PDF