Regret Minimization in Heavy-Tailed Bandits

Shubhada Agrawal; Sandeep K. Juneja; Wouter M. Koolen

Regret Minimization in Heavy-Tailed Bandits

Shubhada Agrawal, Sandeep K. Juneja, Wouter M. Koolen

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:26-62, 2021.

Abstract

We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order \((1+\epsilon)\){are} uniformly bounded by a known constant \(B\), for some given \( \epsilon > 0\). We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

Cite this Paper

BibTeX


@InProceedings{pmlr-v134-agrawal21a,
  title = 	 {Regret Minimization in Heavy-Tailed Bandits},
  author =       {Agrawal, Shubhada and Juneja, Sandeep K. and Koolen, Wouter M.},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {26--62},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/agrawal21a/agrawal21a.pdf},
  url = 	 {https://proceedings.mlr.press/v134/agrawal21a.html},
  abstract = 	 {We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order \((1+\epsilon)\){are} uniformly bounded by a known constant \(B\), for some given \( \epsilon > 0\). We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.}
}

Endnote

%0 Conference Paper
%T Regret Minimization in Heavy-Tailed Bandits
%A Shubhada Agrawal
%A Sandeep K. Juneja
%A Wouter M. Koolen
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-agrawal21a
%I PMLR
%P 26--62
%U https://proceedings.mlr.press/v134/agrawal21a.html
%V 134
%X We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order \((1+\epsilon)\){are} uniformly bounded by a known constant \(B\), for some given \( \epsilon > 0\). We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

APA


Agrawal, S., Juneja, S.K. & Koolen, W.M.. (2021). Regret Minimization in Heavy-Tailed Bandits. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:26-62 Available from https://proceedings.mlr.press/v134/agrawal21a.html.

Related Material

Download PDF