Quantile Bandits for Best Arms Identification

Mengyan Zhang; Cheng Soon Ong

Quantile Bandits for Best Arms Identification

Mengyan Zhang, Cheng Soon Ong

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12513-12523, 2021.

Abstract

We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-zhang21o,
  title = 	 {Quantile Bandits for Best Arms Identification},
  author =       {Zhang, Mengyan and Ong, Cheng Soon},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {12513--12523},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/zhang21o/zhang21o.pdf},
  url = 	 {https://proceedings.mlr.press/v139/zhang21o.html},
  abstract = 	 {We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.}
}

Endnote

%0 Conference Paper
%T Quantile Bandits for Best Arms Identification
%A Mengyan Zhang
%A Cheng Soon Ong
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-zhang21o
%I PMLR
%P 12513--12523
%U https://proceedings.mlr.press/v139/zhang21o.html
%V 139
%X We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.

APA

Zhang, M. & Ong, C.S.. (2021). Quantile Bandits for Best Arms Identification. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12513-12523 Available from https://proceedings.mlr.press/v139/zhang21o.html.

Quantile Bandits for Best Arms Identification

Abstract

Cite this Paper

Related Material