Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Shunta Nonaga, Koji Tabata, Yuta Mizuno, TAMIKI KOMATSUZAKI
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:1182-1197, 2025.

Abstract

Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the \\textit\{mean-variance\}(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-nonaga25a, title = {Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence}, author = {Nonaga, Shunta and Tabata, Koji and Mizuno, Yuta and KOMATSUZAKI, TAMIKI}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {1182--1197}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/nonaga25a/nonaga25a.pdf}, url = {https://proceedings.mlr.press/v304/nonaga25a.html}, abstract = {Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the \\textit\{mean-variance\}(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.} }
Endnote
%0 Conference Paper %T Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence %A Shunta Nonaga %A Koji Tabata %A Yuta Mizuno %A TAMIKI KOMATSUZAKI %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-nonaga25a %I PMLR %P 1182--1197 %U https://proceedings.mlr.press/v304/nonaga25a.html %V 304 %X Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the \\textit\{mean-variance\}(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.
APA
Nonaga, S., Tabata, K., Mizuno, Y. & KOMATSUZAKI, T.. (2025). Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:1182-1197 Available from https://proceedings.mlr.press/v304/nonaga25a.html.

Related Material