Bootstrap in High Dimension with Low Computation

Henry Lam, Zhenyuan Liu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:18419-18453, 2023.

Abstract

The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We study the use of bootstraps in high-dimensional environments with a small number of resamples. In particular, we show that with a recent "cheap" bootstrap perspective, using a number of resamples as small as one could attain valid coverage even when the dimension grows closely with the sample size, thus strongly supporting the implementability of the bootstrap for large-scale problems. We validate our theoretical results and compare the performance of our approach with other benchmarks via a range of experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-lam23a, title = {Bootstrap in High Dimension with Low Computation}, author = {Lam, Henry and Liu, Zhenyuan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {18419--18453}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/lam23a/lam23a.pdf}, url = {https://proceedings.mlr.press/v202/lam23a.html}, abstract = {The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We study the use of bootstraps in high-dimensional environments with a small number of resamples. In particular, we show that with a recent "cheap" bootstrap perspective, using a number of resamples as small as one could attain valid coverage even when the dimension grows closely with the sample size, thus strongly supporting the implementability of the bootstrap for large-scale problems. We validate our theoretical results and compare the performance of our approach with other benchmarks via a range of experiments.} }
Endnote
%0 Conference Paper %T Bootstrap in High Dimension with Low Computation %A Henry Lam %A Zhenyuan Liu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-lam23a %I PMLR %P 18419--18453 %U https://proceedings.mlr.press/v202/lam23a.html %V 202 %X The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We study the use of bootstraps in high-dimensional environments with a small number of resamples. In particular, we show that with a recent "cheap" bootstrap perspective, using a number of resamples as small as one could attain valid coverage even when the dimension grows closely with the sample size, thus strongly supporting the implementability of the bootstrap for large-scale problems. We validate our theoretical results and compare the performance of our approach with other benchmarks via a range of experiments.
APA
Lam, H. & Liu, Z.. (2023). Bootstrap in High Dimension with Low Computation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:18419-18453 Available from https://proceedings.mlr.press/v202/lam23a.html.

Related Material