Active Statistical Inference

Tijana Zrnic, Emmanuel Candes
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62993-63010, 2024.

Abstract

Inspired by the concept of active learning, we propose active inference—a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model’s predictions where it is confident. Active inference constructs valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful tests. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zrnic24a, title = {Active Statistical Inference}, author = {Zrnic, Tijana and Candes, Emmanuel}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {62993--63010}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zrnic24a/zrnic24a.pdf}, url = {https://proceedings.mlr.press/v235/zrnic24a.html}, abstract = {Inspired by the concept of active learning, we propose active inference—a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model’s predictions where it is confident. Active inference constructs valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful tests. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.} }
Endnote
%0 Conference Paper %T Active Statistical Inference %A Tijana Zrnic %A Emmanuel Candes %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zrnic24a %I PMLR %P 62993--63010 %U https://proceedings.mlr.press/v235/zrnic24a.html %V 235 %X Inspired by the concept of active learning, we propose active inference—a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model’s predictions where it is confident. Active inference constructs valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful tests. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.
APA
Zrnic, T. & Candes, E.. (2024). Active Statistical Inference. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:62993-63010 Available from https://proceedings.mlr.press/v235/zrnic24a.html.

Related Material