Problem Dependent View on Structured Thresholding Bandit Problems

James Cheshire, Pierre Menard, Alexandra Carpentier
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1846-1854, 2021.

Abstract

We investigate the \textit{problem dependent regime} in the stochastic \emph{Thresholding Bandit problem} (\tbp) under several \emph{shape constraints}. In the \tbp the objective of the learner is to output, after interacting with the environment, the set of arms whose means are above a given threshold. The vanilla, unstructured, case is already well studied in the literature. Taking $K$ as the number of arms, we consider the case where (i) the sequence of arm’s means $(\mu_k){k=1}^K$ is monotonically increasing (\textit{MTBP}) and (ii) the case where $(\mu_k){k=1}^K$ is concave (\textit{CTBP}). We consider both cases in the \emph{problem dependent} regime and study the probability of error - i.e. the probability to mis-classify at least one arm. In the fixed budget setting, we provide nearly matching upper and lower bounds for the probability of error in both the concave and monotone settings, as well as associated algorithms. Of interest, is that for both the monotone and concave cases, optimal bounds on probability of error are of the same order as those for the two armed bandit problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-cheshire21a, title = {Problem Dependent View on Structured Thresholding Bandit Problems}, author = {Cheshire, James and Menard, Pierre and Carpentier, Alexandra}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {1846--1854}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/cheshire21a/cheshire21a.pdf}, url = {https://proceedings.mlr.press/v139/cheshire21a.html}, abstract = {We investigate the \textit{problem dependent regime} in the stochastic \emph{Thresholding Bandit problem} (\tbp) under several \emph{shape constraints}. In the \tbp the objective of the learner is to output, after interacting with the environment, the set of arms whose means are above a given threshold. The vanilla, unstructured, case is already well studied in the literature. Taking $K$ as the number of arms, we consider the case where (i) the sequence of arm’s means $(\mu_k){k=1}^K$ is monotonically increasing (\textit{MTBP}) and (ii) the case where $(\mu_k){k=1}^K$ is concave (\textit{CTBP}). We consider both cases in the \emph{problem dependent} regime and study the probability of error - i.e. the probability to mis-classify at least one arm. In the fixed budget setting, we provide nearly matching upper and lower bounds for the probability of error in both the concave and monotone settings, as well as associated algorithms. Of interest, is that for both the monotone and concave cases, optimal bounds on probability of error are of the same order as those for the two armed bandit problem.} }
Endnote
%0 Conference Paper %T Problem Dependent View on Structured Thresholding Bandit Problems %A James Cheshire %A Pierre Menard %A Alexandra Carpentier %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-cheshire21a %I PMLR %P 1846--1854 %U https://proceedings.mlr.press/v139/cheshire21a.html %V 139 %X We investigate the \textit{problem dependent regime} in the stochastic \emph{Thresholding Bandit problem} (\tbp) under several \emph{shape constraints}. In the \tbp the objective of the learner is to output, after interacting with the environment, the set of arms whose means are above a given threshold. The vanilla, unstructured, case is already well studied in the literature. Taking $K$ as the number of arms, we consider the case where (i) the sequence of arm’s means $(\mu_k){k=1}^K$ is monotonically increasing (\textit{MTBP}) and (ii) the case where $(\mu_k){k=1}^K$ is concave (\textit{CTBP}). We consider both cases in the \emph{problem dependent} regime and study the probability of error - i.e. the probability to mis-classify at least one arm. In the fixed budget setting, we provide nearly matching upper and lower bounds for the probability of error in both the concave and monotone settings, as well as associated algorithms. Of interest, is that for both the monotone and concave cases, optimal bounds on probability of error are of the same order as those for the two armed bandit problem.
APA
Cheshire, J., Menard, P. & Carpentier, A.. (2021). Problem Dependent View on Structured Thresholding Bandit Problems. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1846-1854 Available from https://proceedings.mlr.press/v139/cheshire21a.html.

Related Material