A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

Xiaoyan Hu, Ho-fung Leung, Farzan Farnia
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1864-1872, 2025.

Abstract

Existing frameworks for evaluating and comparing generative models consider an offline setting, where the evaluator has access to large batches of data produced by the models. However, in practical scenarios, the goal is often to identify and select the best model using the fewest possible generated samples to minimize the costs of querying data from the sub-optimal models. In this work, we propose an online evaluation and selection framework to find the generative model that maximizes a standard assessment score among a group of available models. We view the task as a multi-armed bandit (MAB) and propose upper confidence bound (UCB) bandit algorithms to identify the model producing data with the best evaluation score that quantifies the quality and diversity of generated data. Specifically, we develop the MAB-based selection of generative models considering the Fr{é}chet Distance (FD) and Inception Score (IS) metrics, resulting in the FD-UCB and IS-UCB algorithms. We prove regret bounds for these algorithms and present numerical results on standard image datasets. Our empirical results suggest the efficacy of MAB approaches for the sample-efficient evaluation and selection of deep generative models. The project code is available at \url{https://github.com/yannxiaoyanhu/dgm-online-eval}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-hu25a, title = {A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models}, author = {Hu, Xiaoyan and Leung, Ho-fung and Farnia, Farzan}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1864--1872}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/hu25a/hu25a.pdf}, url = {https://proceedings.mlr.press/v258/hu25a.html}, abstract = {Existing frameworks for evaluating and comparing generative models consider an offline setting, where the evaluator has access to large batches of data produced by the models. However, in practical scenarios, the goal is often to identify and select the best model using the fewest possible generated samples to minimize the costs of querying data from the sub-optimal models. In this work, we propose an online evaluation and selection framework to find the generative model that maximizes a standard assessment score among a group of available models. We view the task as a multi-armed bandit (MAB) and propose upper confidence bound (UCB) bandit algorithms to identify the model producing data with the best evaluation score that quantifies the quality and diversity of generated data. Specifically, we develop the MAB-based selection of generative models considering the Fr{é}chet Distance (FD) and Inception Score (IS) metrics, resulting in the FD-UCB and IS-UCB algorithms. We prove regret bounds for these algorithms and present numerical results on standard image datasets. Our empirical results suggest the efficacy of MAB approaches for the sample-efficient evaluation and selection of deep generative models. The project code is available at \url{https://github.com/yannxiaoyanhu/dgm-online-eval}.} }
Endnote
%0 Conference Paper %T A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models %A Xiaoyan Hu %A Ho-fung Leung %A Farzan Farnia %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-hu25a %I PMLR %P 1864--1872 %U https://proceedings.mlr.press/v258/hu25a.html %V 258 %X Existing frameworks for evaluating and comparing generative models consider an offline setting, where the evaluator has access to large batches of data produced by the models. However, in practical scenarios, the goal is often to identify and select the best model using the fewest possible generated samples to minimize the costs of querying data from the sub-optimal models. In this work, we propose an online evaluation and selection framework to find the generative model that maximizes a standard assessment score among a group of available models. We view the task as a multi-armed bandit (MAB) and propose upper confidence bound (UCB) bandit algorithms to identify the model producing data with the best evaluation score that quantifies the quality and diversity of generated data. Specifically, we develop the MAB-based selection of generative models considering the Fr{é}chet Distance (FD) and Inception Score (IS) metrics, resulting in the FD-UCB and IS-UCB algorithms. We prove regret bounds for these algorithms and present numerical results on standard image datasets. Our empirical results suggest the efficacy of MAB approaches for the sample-efficient evaluation and selection of deep generative models. The project code is available at \url{https://github.com/yannxiaoyanhu/dgm-online-eval}.
APA
Hu, X., Leung, H. & Farnia, F.. (2025). A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1864-1872 Available from https://proceedings.mlr.press/v258/hu25a.html.

Related Material