Revisiting Simple Regret: Fast Rates for Returning a Good Arm

Yao Zhao, Connor Stephens, Csaba Szepesvari, Kwang-Sung Jun
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:42110-42158, 2023.

Abstract

Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to characterize it. In this paper, we make a significant progress on minimizing simple regret in both data-rich ($T\ge n$) and data-poor regime ($T \le n$) where $n$ is the number of arms and $T$ is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm where we bound the probability of returning an arm whose mean reward is not within $\epsilon$ from the best (i.e., not $\epsilon$-good) for any choice of $\epsilon>0$, although $\epsilon$ is not an input to SH. Our bound not only leads to an optimal worst-case simple regret bound of $\sqrt{n/T}$ up to logarithmic factors but also essentially matches the instance-dependent lower bound for returning an $\epsilon$-good arm reported by Katz-Samuels and Jamieson (2020). For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once. Our empirical study shows that BSH outperforms existing methods on real-world tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zhao23g, title = {Revisiting Simple Regret: Fast Rates for Returning a Good Arm}, author = {Zhao, Yao and Stephens, Connor and Szepesvari, Csaba and Jun, Kwang-Sung}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {42110--42158}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zhao23g/zhao23g.pdf}, url = {https://proceedings.mlr.press/v202/zhao23g.html}, abstract = {Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to characterize it. In this paper, we make a significant progress on minimizing simple regret in both data-rich ($T\ge n$) and data-poor regime ($T \le n$) where $n$ is the number of arms and $T$ is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm where we bound the probability of returning an arm whose mean reward is not within $\epsilon$ from the best (i.e., not $\epsilon$-good) for any choice of $\epsilon>0$, although $\epsilon$ is not an input to SH. Our bound not only leads to an optimal worst-case simple regret bound of $\sqrt{n/T}$ up to logarithmic factors but also essentially matches the instance-dependent lower bound for returning an $\epsilon$-good arm reported by Katz-Samuels and Jamieson (2020). For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once. Our empirical study shows that BSH outperforms existing methods on real-world tasks.} }
Endnote
%0 Conference Paper %T Revisiting Simple Regret: Fast Rates for Returning a Good Arm %A Yao Zhao %A Connor Stephens %A Csaba Szepesvari %A Kwang-Sung Jun %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zhao23g %I PMLR %P 42110--42158 %U https://proceedings.mlr.press/v202/zhao23g.html %V 202 %X Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to characterize it. In this paper, we make a significant progress on minimizing simple regret in both data-rich ($T\ge n$) and data-poor regime ($T \le n$) where $n$ is the number of arms and $T$ is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm where we bound the probability of returning an arm whose mean reward is not within $\epsilon$ from the best (i.e., not $\epsilon$-good) for any choice of $\epsilon>0$, although $\epsilon$ is not an input to SH. Our bound not only leads to an optimal worst-case simple regret bound of $\sqrt{n/T}$ up to logarithmic factors but also essentially matches the instance-dependent lower bound for returning an $\epsilon$-good arm reported by Katz-Samuels and Jamieson (2020). For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once. Our empirical study shows that BSH outperforms existing methods on real-world tasks.
APA
Zhao, Y., Stephens, C., Szepesvari, C. & Jun, K.. (2023). Revisiting Simple Regret: Fast Rates for Returning a Good Arm. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:42110-42158 Available from https://proceedings.mlr.press/v202/zhao23g.html.

Related Material