Simultaneous Inference for Massive Data: Distributed Bootstrap

Yang Yu; Shih-Kang Chao; Guang Cheng

Simultaneous Inference for Massive Data: Distributed Bootstrap

Yang Yu, Shih-Kang Chao, Guang Cheng

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10892-10901, 2020.

Abstract

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods (Kleiner et al., 2014; Sengupta et al., 2016), while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-yu20a,
  title = 	 {Simultaneous Inference for Massive Data: Distributed Bootstrap},
  author =       {Yu, Yang and Chao, Shih-Kang and Cheng, Guang},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {10892--10901},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/yu20a/yu20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/yu20a.html},
  abstract = 	 {In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods (Kleiner et al., 2014; Sengupta et al., 2016), while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.}
}

Endnote

%0 Conference Paper
%T Simultaneous Inference for Massive Data: Distributed Bootstrap
%A Yang Yu
%A Shih-Kang Chao
%A Guang Cheng
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-yu20a
%I PMLR
%P 10892--10901
%U https://proceedings.mlr.press/v119/yu20a.html
%V 119
%X In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods (Kleiner et al., 2014; Sengupta et al., 2016), while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.

APA


Yu, Y., Chao, S. & Cheng, G.. (2020). Simultaneous Inference for Massive Data: Distributed Bootstrap. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10892-10901 Available from https://proceedings.mlr.press/v119/yu20a.html.

Simultaneous Inference for Massive Data: Distributed Bootstrap

Abstract

Cite this Paper

Related Material