A PTAS for the Bayesian Thresholding Bandit Problem
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2455-2464, 2020.
In this paper, we study the Bayesian thresholding bandit problem (BTBP), where the goal is to adaptively make a budget of $Q$ queries to $n$ stochastic arms and determine the label for each arm (whether its mean reward is closer to $0$ or $1$). We present a polynomial-time approximation scheme for the BTBP with runtime $O(f(\epsilon) + Q)$ that achieves expected labeling accuracy at least $(\opt(Q) - \epsilon)$, where $f(\cdot)$ is a function that only depends on $\epsilon$ and $\opt(Q)$ is the optimal expected accuracy achieved by any algorithm. For any fixed $\epsilon > 0$, our algorithm runs in time linear with $Q$. The main algorithmic ideas we use include discretization employed in the PTASs for many dynamic programming algorithms (such as Knapsack), as well as many problem specific techniques such as proving an upper bound on the number of query numbers for any arm made by an almost optimal policy, and establishing the smoothness property of the $\opt(\cdot)$ curve, etc.