Quantile Bandits for Best Arms Identification

Mengyan Zhang, Cheng Soon Ong
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12513-12523, 2021.

Abstract

We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zhang21o, title = {Quantile Bandits for Best Arms Identification}, author = {Zhang, Mengyan and Ong, Cheng Soon}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12513--12523}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zhang21o/zhang21o.pdf}, url = {https://proceedings.mlr.press/v139/zhang21o.html}, abstract = {We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.} }
Endnote
%0 Conference Paper %T Quantile Bandits for Best Arms Identification %A Mengyan Zhang %A Cheng Soon Ong %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zhang21o %I PMLR %P 12513--12523 %U https://proceedings.mlr.press/v139/zhang21o.html %V 139 %X We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.
APA
Zhang, M. & Ong, C.S.. (2021). Quantile Bandits for Best Arms Identification. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12513-12523 Available from https://proceedings.mlr.press/v139/zhang21o.html.

Related Material