Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal

Deheng Yuan, Tao Guo, Zhongyi Huang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:73478-73510, 2025.

Abstract

Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using information-theoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yuan25b, title = {Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal}, author = {Yuan, Deheng and Guo, Tao and Huang, Zhongyi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {73478--73510}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yuan25b/yuan25b.pdf}, url = {https://proceedings.mlr.press/v267/yuan25b.html}, abstract = {Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using information-theoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models.} }
Endnote
%0 Conference Paper %T Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal %A Deheng Yuan %A Tao Guo %A Zhongyi Huang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yuan25b %I PMLR %P 73478--73510 %U https://proceedings.mlr.press/v267/yuan25b.html %V 267 %X Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using information-theoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models.
APA
Yuan, D., Guo, T. & Huang, Z.. (2025). Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:73478-73510 Available from https://proceedings.mlr.press/v267/yuan25b.html.

Related Material