Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters

Gianmarco Genalti, Alberto Maria Metelli
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:1-5, 2025.

Abstract

The heavy-tailed bandit problem Bubeck et al.,2013 , is a variant of the stochastic multi-armed bandit problem where the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u < +\infty$, for some $\epsilon \in (0,1]$. In this setting, most of the proposed approaches crucially rely on the knowledge of both $\epsilon$ and $u$. Recent works have highlighted that adapting to such parameters when they are unknown is harder than adapting to the subgaussian constant or the rewards range in non-heavy-tailed bandits. It is known that it is not possible to adapt to either $\epsilon$ or $u$ without either ($i$) incurring extra regret or ($ii$) enforcing additional assumptions. However, it remains an open question what the best attainable performance is when no additional assumptions are provided. Moreover, the assumptions proposed in the literature are not comparable, as none of them is strictly weaker than the others. Thus, another open question is about the nature of the assumptions needed to compensate for this cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-genalti25a, title = {Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters}, author = {Genalti, Gianmarco and Metelli, Alberto Maria}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {1--5}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/genalti25a/genalti25a.pdf}, url = {https://proceedings.mlr.press/v291/genalti25a.html}, abstract = {The heavy-tailed bandit problem Bubeck et al.,2013 , is a variant of the stochastic multi-armed bandit problem where the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u < +\infty$, for some $\epsilon \in (0,1]$. In this setting, most of the proposed approaches crucially rely on the knowledge of both $\epsilon$ and $u$. Recent works have highlighted that adapting to such parameters when they are unknown is harder than adapting to the subgaussian constant or the rewards range in non-heavy-tailed bandits. It is known that it is not possible to adapt to either $\epsilon$ or $u$ without either ($i$) incurring extra regret or ($ii$) enforcing additional assumptions. However, it remains an open question what the best attainable performance is when no additional assumptions are provided. Moreover, the assumptions proposed in the literature are not comparable, as none of them is strictly weaker than the others. Thus, another open question is about the nature of the assumptions needed to compensate for this cost.} }
Endnote
%0 Conference Paper %T Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters %A Gianmarco Genalti %A Alberto Maria Metelli %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-genalti25a %I PMLR %P 1--5 %U https://proceedings.mlr.press/v291/genalti25a.html %V 291 %X The heavy-tailed bandit problem Bubeck et al.,2013 , is a variant of the stochastic multi-armed bandit problem where the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u < +\infty$, for some $\epsilon \in (0,1]$. In this setting, most of the proposed approaches crucially rely on the knowledge of both $\epsilon$ and $u$. Recent works have highlighted that adapting to such parameters when they are unknown is harder than adapting to the subgaussian constant or the rewards range in non-heavy-tailed bandits. It is known that it is not possible to adapt to either $\epsilon$ or $u$ without either ($i$) incurring extra regret or ($ii$) enforcing additional assumptions. However, it remains an open question what the best attainable performance is when no additional assumptions are provided. Moreover, the assumptions proposed in the literature are not comparable, as none of them is strictly weaker than the others. Thus, another open question is about the nature of the assumptions needed to compensate for this cost.
APA
Genalti, G. & Metelli, A.M.. (2025). Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:1-5 Available from https://proceedings.mlr.press/v291/genalti25a.html.

Related Material