AutoEval Done Right: Using Synthetic Data for Model Evaluation

Pierre Boyeau, Anastasios Nikolas Angelopoulos, Tianle Li, Nir Yosef, Jitendra Malik, Michael I. Jordan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:5276-5290, 2025.

Abstract

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-boyeau25a, title = {{A}uto{E}val Done Right: Using Synthetic Data for Model Evaluation}, author = {Boyeau, Pierre and Angelopoulos, Anastasios Nikolas and Li, Tianle and Yosef, Nir and Malik, Jitendra and Jordan, Michael I.}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {5276--5290}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/boyeau25a/boyeau25a.pdf}, url = {https://proceedings.mlr.press/v267/boyeau25a.html}, abstract = {The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased.} }
Endnote
%0 Conference Paper %T AutoEval Done Right: Using Synthetic Data for Model Evaluation %A Pierre Boyeau %A Anastasios Nikolas Angelopoulos %A Tianle Li %A Nir Yosef %A Jitendra Malik %A Michael I. Jordan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-boyeau25a %I PMLR %P 5276--5290 %U https://proceedings.mlr.press/v267/boyeau25a.html %V 267 %X The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased.
APA
Boyeau, P., Angelopoulos, A.N., Li, T., Yosef, N., Malik, J. & Jordan, M.I.. (2025). AutoEval Done Right: Using Synthetic Data for Model Evaluation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:5276-5290 Available from https://proceedings.mlr.press/v267/boyeau25a.html.

Related Material