[edit]
The Influence of Shape Constraints on the Thresholding Bandit Problem
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1228-1275, 2020.
Abstract
We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means $(\mu_k)_k$ is monotonically increasing \textit{MTBP}, (iii) the case where $(\mu_k)_k$ is unimodal \textit{UTBP} and (iv) the case where $(\mu_k)_k$ is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for \textit{TBP}, (ii) $\sqrt{\log(K)/T}$ for \textit{MTBP}, (iii) $\sqrt{K/T}$ for \textit{UTBP} and (iv) $\sqrt{\log\log K/T}$ for \textit{CTBP}, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.