Regret Minimization in Heavy-Tailed Bandits

Shubhada Agrawal, Sandeep K. Juneja, Wouter M. Koolen
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:26-62, 2021.

Abstract

We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order \((1+\epsilon)\){are} uniformly bounded by a known constant \(B\), for some given \( \epsilon > 0\). We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-agrawal21a, title = {Regret Minimization in Heavy-Tailed Bandits}, author = {Agrawal, Shubhada and Juneja, Sandeep K. and Koolen, Wouter M.}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {26--62}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/agrawal21a/agrawal21a.pdf}, url = {https://proceedings.mlr.press/v134/agrawal21a.html}, abstract = {We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order \((1+\epsilon)\){are} uniformly bounded by a known constant \(B\), for some given \( \epsilon > 0\). We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.} }
Endnote
%0 Conference Paper %T Regret Minimization in Heavy-Tailed Bandits %A Shubhada Agrawal %A Sandeep K. Juneja %A Wouter M. Koolen %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-agrawal21a %I PMLR %P 26--62 %U https://proceedings.mlr.press/v134/agrawal21a.html %V 134 %X We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order \((1+\epsilon)\){are} uniformly bounded by a known constant \(B\), for some given \( \epsilon > 0\). We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.
APA
Agrawal, S., Juneja, S.K. & Koolen, W.M.. (2021). Regret Minimization in Heavy-Tailed Bandits. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:26-62 Available from https://proceedings.mlr.press/v134/agrawal21a.html.

Related Material