The Influence of Shape Constraints on the Thresholding Bandit Problem

James Cheshire; Pierre Menard; Alexandra Carpentier

The Influence of Shape Constraints on the Thresholding Bandit Problem

James Cheshire, Pierre Menard, Alexandra Carpentier

Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1228-1275, 2020.

Abstract

We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means $(\mu_k)_k$ is monotonically increasing \textit{MTBP}, (iii) the case where $(\mu_k)_k$ is unimodal \textit{UTBP} and (iv) the case where $(\mu_k)_k$ is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for \textit{TBP}, (ii) $\sqrt{\log(K)/T}$ for \textit{MTBP}, (iii) $\sqrt{K/T}$ for \textit{UTBP} and (iv) $\sqrt{\log\log K/T}$ for \textit{CTBP}, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.

Cite this Paper

BibTeX


@InProceedings{pmlr-v125-cheshire20a,
  title = 	 {The Influence of Shape Constraints on the Thresholding Bandit Problem},
  author =       {Cheshire, James and Menard, Pierre and Carpentier, Alexandra},
  booktitle = 	 {Proceedings of Thirty Third Conference on Learning Theory},
  pages = 	 {1228--1275},
  year = 	 {2020},
  editor = 	 {Abernethy, Jacob and Agarwal, Shivani},
  volume = 	 {125},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v125/cheshire20a/cheshire20a.pdf},
  url = 	 {https://proceedings.mlr.press/v125/cheshire20a.html},
  abstract = 	 { We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means $(\mu_k)_k$ is monotonically increasing \textit{MTBP}, (iii) the case where $(\mu_k)_k$ is unimodal \textit{UTBP} and (iv) the case where $(\mu_k)_k$ is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for \textit{TBP}, (ii) $\sqrt{\log(K)/T}$ for \textit{MTBP}, (iii) $\sqrt{K/T}$ for \textit{UTBP} and (iv) $\sqrt{\log\log K/T}$ for \textit{CTBP}, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.}
}

Endnote

%0 Conference Paper
%T The Influence of Shape Constraints on the Thresholding Bandit Problem
%A James Cheshire
%A Pierre Menard
%A Alexandra Carpentier
%B Proceedings of Thirty Third Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2020
%E Jacob Abernethy
%E Shivani Agarwal	
%F pmlr-v125-cheshire20a
%I PMLR
%P 1228--1275
%U https://proceedings.mlr.press/v125/cheshire20a.html
%V 125
%X  We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. On top of (i) the vanilla, unstructured \textit{TBP}, we consider the case where (ii) the sequence of arm’s means $(\mu_k)_k$ is monotonically increasing \textit{MTBP}, (iii) the case where $(\mu_k)_k$ is unimodal \textit{UTBP} and (iv) the case where $(\mu_k)_k$ is concave \textit{CTBP}. In the \textit{TBP} problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. In the fixed budget setting, we provide \emph{problem independent} minimax rates for the expected regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for \textit{TBP}, (ii) $\sqrt{\log(K)/T}$ for \textit{MTBP}, (iii) $\sqrt{K/T}$ for \textit{UTBP} and (iv) $\sqrt{\log\log K/T}$ for \textit{CTBP}, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the \textit{TBP}.

APA


Cheshire, J., Menard, P. & Carpentier, A.. (2020). The Influence of Shape Constraints on the Thresholding Bandit Problem. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:1228-1275 Available from https://proceedings.mlr.press/v125/cheshire20a.html.

Related Material

Download PDF