Multi-thresholding Good Arm Identification with Bandit Feedback

Xuanke Jiang; Sherief Hashima; Kohei Hatano; Eiji Takimoto

Multi-thresholding Good Arm Identification with Bandit Feedback

Xuanke Jiang, Sherief Hashima, Kohei Hatano, Eiji Takimoto

Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:49-64, 2025.

Abstract

We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v304-jiang25a,
  title = 	 {Multi-thresholding Good Arm Identification with Bandit Feedback},
  author =       {Jiang, Xuanke and Hashima, Sherief and Hatano, Kohei and Takimoto, Eiji},
  booktitle = 	 {Proceedings of the 17th Asian Conference on Machine Learning},
  pages = 	 {49--64},
  year = 	 {2025},
  editor = 	 {Lee, Hung-yi and Liu, Tongliang},
  volume = 	 {304},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v304/main/assets/jiang25a/jiang25a.pdf},
  url = 	 {https://proceedings.mlr.press/v304/jiang25a.html},
  abstract = 	 {We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$.  The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.}
}

Endnote

%0 Conference Paper
%T Multi-thresholding Good Arm Identification with Bandit Feedback
%A Xuanke Jiang
%A Sherief Hashima
%A Kohei Hatano
%A Eiji Takimoto
%B Proceedings of the 17th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Hung-yi Lee
%E Tongliang Liu	
%F pmlr-v304-jiang25a
%I PMLR
%P 49--64
%U https://proceedings.mlr.press/v304/jiang25a.html
%V 304
%X We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$.  The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.

APA

Jiang, X., Hashima, S., Hatano, K. & Takimoto, E.. (2025). Multi-thresholding Good Arm Identification with Bandit Feedback. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:49-64 Available from https://proceedings.mlr.press/v304/jiang25a.html.

Multi-thresholding Good Arm Identification with Bandit Feedback

Abstract

Cite this Paper

Related Material