$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria Metelli
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1882-1915, 2024.

Abstract

Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u<+\infty$, for some $\epsilon \in (0,1]$. In this setting, we study the regret minimization problem when $\epsilon$ and $u$ are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, \texttt{AdaR-UCB}, that leverages such an estimator. Finally, we show that \texttt{AdaR-UCB} is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-genalti24a, title = {$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits}, author = {Genalti, Gianmarco and Marsigli, Lupo and Gatti, Nicola and Metelli, Alberto Maria}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {1882--1915}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/genalti24a/genalti24a.pdf}, url = {https://proceedings.mlr.press/v247/genalti24a.html}, abstract = {Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u<+\infty$, for some $\epsilon \in (0,1]$. In this setting, we study the regret minimization problem when $\epsilon$ and $u$ are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, \texttt{AdaR-UCB}, that leverages such an estimator. Finally, we show that \texttt{AdaR-UCB} is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.} }
Endnote
%0 Conference Paper %T $(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits %A Gianmarco Genalti %A Lupo Marsigli %A Nicola Gatti %A Alberto Maria Metelli %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-genalti24a %I PMLR %P 1882--1915 %U https://proceedings.mlr.press/v247/genalti24a.html %V 247 %X Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u<+\infty$, for some $\epsilon \in (0,1]$. In this setting, we study the regret minimization problem when $\epsilon$ and $u$ are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, \texttt{AdaR-UCB}, that leverages such an estimator. Finally, we show that \texttt{AdaR-UCB} is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
APA
Genalti, G., Marsigli, L., Gatti, N. & Metelli, A.M.. (2024). $(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1882-1915 Available from https://proceedings.mlr.press/v247/genalti24a.html.

Related Material