Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data

Ganggang Xu; Zuofeng Shang; Guang Cheng

Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data

Ganggang Xu, Zuofeng Shang, Guang Cheng

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5483-5491, 2018.

Abstract

Divide-and-conquer is a powerful approach for large and massive data analysis. In the nonparameteric regression setting, although various theoretical frameworks have been established to achieve optimality in estimation or hypothesis testing, how to choose the tuning parameter in a practically effective way is still an open problem. In this paper, we propose a data-driven procedure based on divide-and-conquer for selecting the tuning parameters in kernel ridge regression by modifying the popular Generalized Cross-validation (GCV, Wahba, 1990). While the proposed criterion is computationally scalable for massive data sets, it is also shown under mild conditions to be asymptotically optimal in the sense that minimizing the proposed distributed-GCV (dGCV) criterion is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-xu18f,
  title = 	 {Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data},
  author =       {Xu, Ganggang and Shang, Zuofeng and Cheng, Guang},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {5483--5491},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/xu18f/xu18f.pdf},
  url = 	 {https://proceedings.mlr.press/v80/xu18f.html},
  abstract = 	 {Divide-and-conquer is a powerful approach for large and massive data analysis. In the nonparameteric regression setting, although various theoretical frameworks have been established to achieve optimality in estimation or hypothesis testing, how to choose the tuning parameter in a practically effective way is still an open problem. In this paper, we propose a data-driven procedure based on divide-and-conquer for selecting the tuning parameters in kernel ridge regression by modifying the popular Generalized Cross-validation (GCV, Wahba, 1990). While the proposed criterion is computationally scalable for massive data sets, it is also shown under mild conditions to be asymptotically optimal in the sense that minimizing the proposed distributed-GCV (dGCV) criterion is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.}
}

Endnote

%0 Conference Paper
%T Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data
%A Ganggang Xu
%A Zuofeng Shang
%A Guang Cheng
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-xu18f
%I PMLR
%P 5483--5491
%U https://proceedings.mlr.press/v80/xu18f.html
%V 80
%X Divide-and-conquer is a powerful approach for large and massive data analysis. In the nonparameteric regression setting, although various theoretical frameworks have been established to achieve optimality in estimation or hypothesis testing, how to choose the tuning parameter in a practically effective way is still an open problem. In this paper, we propose a data-driven procedure based on divide-and-conquer for selecting the tuning parameters in kernel ridge regression by modifying the popular Generalized Cross-validation (GCV, Wahba, 1990). While the proposed criterion is computationally scalable for massive data sets, it is also shown under mild conditions to be asymptotically optimal in the sense that minimizing the proposed distributed-GCV (dGCV) criterion is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.

APA


Xu, G., Shang, Z. & Cheng, G.. (2018). Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5483-5491 Available from https://proceedings.mlr.press/v80/xu18f.html.

Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data

Abstract

Cite this Paper

Related Material