Data-driven confidence bands for distributed nonparametric regression

Valeriy Avanesov
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:300-322, 2020.

Abstract

Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-avanesov20a, title = {Data-driven confidence bands for distributed nonparametric regression}, author = {Avanesov, Valeriy}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {300--322}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/avanesov20a/avanesov20a.pdf}, url = {https://proceedings.mlr.press/v125/avanesov20a.html}, abstract = { Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.} }
Endnote
%0 Conference Paper %T Data-driven confidence bands for distributed nonparametric regression %A Valeriy Avanesov %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-avanesov20a %I PMLR %P 300--322 %U https://proceedings.mlr.press/v125/avanesov20a.html %V 125 %X Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.
APA
Avanesov, V.. (2020). Data-driven confidence bands for distributed nonparametric regression. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:300-322 Available from https://proceedings.mlr.press/v125/avanesov20a.html.

Related Material