[edit]
The Influence of Shape Constraints on the Thresholding Bandit Problem
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1228-1275, 2020.
Abstract
We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means (μk)k is monotonically increasing \textit{MTBP}, (iii) the case where (μk)k is unimodal \textit{UTBP} and (iv) the case where (μk)k is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) √log(K)K/T for \textit{TBP}, (ii) √log(K)/T for \textit{MTBP}, (iii) √K/T for \textit{UTBP} and (iv) √loglogK/T for \textit{CTBP}, where K is the number of arms and T is the budget. These rates demonstrate that \textit{the dependence on K} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.