Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2830-2840, 2020.

Abstract

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings — often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-rogers20a, title = {Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis}, author = {Rogers, Ryan and Roth, Aaron and Smith, Adam and Srebro, Nathan and Thakkar, Om and Woodworth, Blake}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2830--2840}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/rogers20a/rogers20a.pdf}, url = {https://proceedings.mlr.press/v108/rogers20a.html}, abstract = {We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings — often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.} }
Endnote
%0 Conference Paper %T Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis %A Ryan Rogers %A Aaron Roth %A Adam Smith %A Nathan Srebro %A Om Thakkar %A Blake Woodworth %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-rogers20a %I PMLR %P 2830--2840 %U https://proceedings.mlr.press/v108/rogers20a.html %V 108 %X We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings — often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.
APA
Rogers, R., Roth, A., Smith, A., Srebro, N., Thakkar, O. & Woodworth, B.. (2020). Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2830-2840 Available from https://proceedings.mlr.press/v108/rogers20a.html.

Related Material