Multi-thresholding Good Arm Identification with Bandit Feedback

Xuanke Jiang, Sherief Hashima, Kohei Hatano, Eiji Takimoto
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:49-64, 2025.

Abstract

We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-jiang25a, title = {Multi-thresholding Good Arm Identification with Bandit Feedback}, author = {Jiang, Xuanke and Hashima, Sherief and Hatano, Kohei and Takimoto, Eiji}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {49--64}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/jiang25a/jiang25a.pdf}, url = {https://proceedings.mlr.press/v304/jiang25a.html}, abstract = {We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.} }
Endnote
%0 Conference Paper %T Multi-thresholding Good Arm Identification with Bandit Feedback %A Xuanke Jiang %A Sherief Hashima %A Kohei Hatano %A Eiji Takimoto %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-jiang25a %I PMLR %P 49--64 %U https://proceedings.mlr.press/v304/jiang25a.html %V 304 %X We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.
APA
Jiang, X., Hashima, S., Hatano, K. & Takimoto, E.. (2025). Multi-thresholding Good Arm Identification with Bandit Feedback. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:49-64 Available from https://proceedings.mlr.press/v304/jiang25a.html.

Related Material