A PTAS for the Bayesian Thresholding Bandit Problem

Jian Peng, Yue Qin, Yadi Wei, Yuan Zhou
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2455-2464, 2020.

Abstract

In this paper, we study the Bayesian thresholding bandit problem (BTBP), where the goal is to adaptively make a budget of $Q$ queries to $n$ stochastic arms and determine the label for each arm (whether its mean reward is closer to $0$ or $1$). We present a polynomial-time approximation scheme for the BTBP with runtime $O(f(\epsilon) + Q)$ that achieves expected labeling accuracy at least $(\opt(Q) - \epsilon)$, where $f(\cdot)$ is a function that only depends on $\epsilon$ and $\opt(Q)$ is the optimal expected accuracy achieved by any algorithm. For any fixed $\epsilon > 0$, our algorithm runs in time linear with $Q$. The main algorithmic ideas we use include discretization employed in the PTASs for many dynamic programming algorithms (such as Knapsack), as well as many problem specific techniques such as proving an upper bound on the number of query numbers for any arm made by an almost optimal policy, and establishing the smoothness property of the $\opt(\cdot)$ curve, etc.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-peng20a, title = {A PTAS for the Bayesian Thresholding Bandit Problem}, author = {Peng, Jian and Qin, Yue and Wei, Yadi and Zhou, Yuan}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2455--2464}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/qin20a/qin20a.pdf}, url = {https://proceedings.mlr.press/v108/peng20a.html}, abstract = {In this paper, we study the Bayesian thresholding bandit problem (BTBP), where the goal is to adaptively make a budget of $Q$ queries to $n$ stochastic arms and determine the label for each arm (whether its mean reward is closer to $0$ or $1$). We present a polynomial-time approximation scheme for the BTBP with runtime $O(f(\epsilon) + Q)$ that achieves expected labeling accuracy at least $(\opt(Q) - \epsilon)$, where $f(\cdot)$ is a function that only depends on $\epsilon$ and $\opt(Q)$ is the optimal expected accuracy achieved by any algorithm. For any fixed $\epsilon > 0$, our algorithm runs in time linear with $Q$. The main algorithmic ideas we use include discretization employed in the PTASs for many dynamic programming algorithms (such as Knapsack), as well as many problem specific techniques such as proving an upper bound on the number of query numbers for any arm made by an almost optimal policy, and establishing the smoothness property of the $\opt(\cdot)$ curve, etc.} }
Endnote
%0 Conference Paper %T A PTAS for the Bayesian Thresholding Bandit Problem %A Jian Peng %A Yue Qin %A Yadi Wei %A Yuan Zhou %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-peng20a %I PMLR %P 2455--2464 %U https://proceedings.mlr.press/v108/peng20a.html %V 108 %X In this paper, we study the Bayesian thresholding bandit problem (BTBP), where the goal is to adaptively make a budget of $Q$ queries to $n$ stochastic arms and determine the label for each arm (whether its mean reward is closer to $0$ or $1$). We present a polynomial-time approximation scheme for the BTBP with runtime $O(f(\epsilon) + Q)$ that achieves expected labeling accuracy at least $(\opt(Q) - \epsilon)$, where $f(\cdot)$ is a function that only depends on $\epsilon$ and $\opt(Q)$ is the optimal expected accuracy achieved by any algorithm. For any fixed $\epsilon > 0$, our algorithm runs in time linear with $Q$. The main algorithmic ideas we use include discretization employed in the PTASs for many dynamic programming algorithms (such as Knapsack), as well as many problem specific techniques such as proving an upper bound on the number of query numbers for any arm made by an almost optimal policy, and establishing the smoothness property of the $\opt(\cdot)$ curve, etc.
APA
Peng, J., Qin, Y., Wei, Y. & Zhou, Y.. (2020). A PTAS for the Bayesian Thresholding Bandit Problem. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2455-2464 Available from https://proceedings.mlr.press/v108/peng20a.html.

Related Material