The Influence of Shape Constraints on the Thresholding Bandit Problem

James Cheshire, Pierre Menard, Alexandra Carpentier
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1228-1275, 2020.

Abstract

We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means $(\mu_k)_k$ is monotonically increasing \textit{MTBP}, (iii) the case where $(\mu_k)_k$ is unimodal \textit{UTBP} and (iv) the case where $(\mu_k)_k$ is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for \textit{TBP}, (ii) $\sqrt{\log(K)/T}$ for \textit{MTBP}, (iii) $\sqrt{K/T}$ for \textit{UTBP} and (iv) $\sqrt{\log\log K/T}$ for \textit{CTBP}, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-cheshire20a, title = {The Influence of Shape Constraints on the Thresholding Bandit Problem}, author = {Cheshire, James and Menard, Pierre and Carpentier, Alexandra}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {1228--1275}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/cheshire20a/cheshire20a.pdf}, url = {https://proceedings.mlr.press/v125/cheshire20a.html}, abstract = { We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means $(\mu_k)_k$ is monotonically increasing \textit{MTBP}, (iii) the case where $(\mu_k)_k$ is unimodal \textit{UTBP} and (iv) the case where $(\mu_k)_k$ is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for \textit{TBP}, (ii) $\sqrt{\log(K)/T}$ for \textit{MTBP}, (iii) $\sqrt{K/T}$ for \textit{UTBP} and (iv) $\sqrt{\log\log K/T}$ for \textit{CTBP}, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.} }
Endnote
%0 Conference Paper %T The Influence of Shape Constraints on the Thresholding Bandit Problem %A James Cheshire %A Pierre Menard %A Alexandra Carpentier %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-cheshire20a %I PMLR %P 1228--1275 %U https://proceedings.mlr.press/v125/cheshire20a.html %V 125 %X We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means $(\mu_k)_k$ is monotonically increasing \textit{MTBP}, (iii) the case where $(\mu_k)_k$ is unimodal \textit{UTBP} and (iv) the case where $(\mu_k)_k$ is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for \textit{TBP}, (ii) $\sqrt{\log(K)/T}$ for \textit{MTBP}, (iii) $\sqrt{K/T}$ for \textit{UTBP} and (iv) $\sqrt{\log\log K/T}$ for \textit{CTBP}, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.
APA
Cheshire, J., Menard, P. & Carpentier, A.. (2020). The Influence of Shape Constraints on the Thresholding Bandit Problem. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:1228-1275 Available from https://proceedings.mlr.press/v125/cheshire20a.html.

Related Material