[edit]
Multi-thresholding Good Arm Identification with Bandit Feedback
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:49-64, 2025.
Abstract
We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.