Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression

Benjamin Eyre, David Madras
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:15596-15612, 2025.

Abstract

The availability of machine learning systems that can effectively perform arbitrary tasks has led to synthetic labels from these systems being used in applications of statistical inference, such as data analysis or model evaluation. The Prediction Powered Inference (PPI) framework provides a way of leveraging both a large pool of pseudo-labelled data and a small sample with real, high-quality labels to produce a low-variance, unbiased estimate of the quantity being evaluated for. Most work on PPI considers a relatively sizable set of labelled samples, which can be resource intensive to obtain. However, we find that when labelled data is scarce, the PPI++ method can perform even worse than classical inference. We analyze this phenomenon by relating PPI++ to ordinary least squares regression, which also experiences high variance with small sample sizes, and use this regression framework to better understand the efficacy of PPI. Motivated by this, we present two new PPI-based techniques that leverage robust regressors to produce even lower variance estimators in the few-label regime

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-eyre25a, title = {Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression}, author = {Eyre, Benjamin and Madras, David}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {15596--15612}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/eyre25a/eyre25a.pdf}, url = {https://proceedings.mlr.press/v267/eyre25a.html}, abstract = {The availability of machine learning systems that can effectively perform arbitrary tasks has led to synthetic labels from these systems being used in applications of statistical inference, such as data analysis or model evaluation. The Prediction Powered Inference (PPI) framework provides a way of leveraging both a large pool of pseudo-labelled data and a small sample with real, high-quality labels to produce a low-variance, unbiased estimate of the quantity being evaluated for. Most work on PPI considers a relatively sizable set of labelled samples, which can be resource intensive to obtain. However, we find that when labelled data is scarce, the PPI++ method can perform even worse than classical inference. We analyze this phenomenon by relating PPI++ to ordinary least squares regression, which also experiences high variance with small sample sizes, and use this regression framework to better understand the efficacy of PPI. Motivated by this, we present two new PPI-based techniques that leverage robust regressors to produce even lower variance estimators in the few-label regime} }
Endnote
%0 Conference Paper %T Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression %A Benjamin Eyre %A David Madras %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-eyre25a %I PMLR %P 15596--15612 %U https://proceedings.mlr.press/v267/eyre25a.html %V 267 %X The availability of machine learning systems that can effectively perform arbitrary tasks has led to synthetic labels from these systems being used in applications of statistical inference, such as data analysis or model evaluation. The Prediction Powered Inference (PPI) framework provides a way of leveraging both a large pool of pseudo-labelled data and a small sample with real, high-quality labels to produce a low-variance, unbiased estimate of the quantity being evaluated for. Most work on PPI considers a relatively sizable set of labelled samples, which can be resource intensive to obtain. However, we find that when labelled data is scarce, the PPI++ method can perform even worse than classical inference. We analyze this phenomenon by relating PPI++ to ordinary least squares regression, which also experiences high variance with small sample sizes, and use this regression framework to better understand the efficacy of PPI. Motivated by this, we present two new PPI-based techniques that leverage robust regressors to produce even lower variance estimators in the few-label regime
APA
Eyre, B. & Madras, D.. (2025). Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:15596-15612 Available from https://proceedings.mlr.press/v267/eyre25a.html.

Related Material