TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Yi Hao; Ayush Jain; Alon Orlitsky; Vaishakh Ravindrakumar

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Yi Hao, Ayush Jain, Alon Orlitsky, Vaishakh Ravindrakumar

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8427-8445, 2022.

Abstract

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an

$\ell_1$ distance essentially at most a constant times larger than its closest

$t$ -piece degree-

$d$ polynomial, where

$t\ge1$ and

$d\ge0$ . Letting

$c_{t,d}$ denote the smallest such factor, clearly

$c_{1,0}=1$ , and it can be shown that

$c_{t,d}\ge 2$ for all other

$t$ and

$d$ . Yet current computationally efficient algorithms show only

$c_{t,1}\le 2.25$ and the bound rises quickly to

$c_{t,d}\le 3$ for

$d\ge 9$ . We derive a near-linear-time and essentially sample-optimal estimator that establishes

$c_{t,d}=2$ for all

$(t,d)\ne(1,0)$ . Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-hao22a,
  title = 	 {{TURF}: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm},
  author =       {Hao, Yi and Jain, Ayush and Orlitsky, Alon and Ravindrakumar, Vaishakh},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {8427--8445},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/hao22a/hao22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/hao22a.html},
  abstract = 	 {Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $\ell_1$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d$ polynomial, where $t\ge1$ and $d\ge0$. Letting $c_{t,d}$ denote the smallest such factor, clearly $c_{1,0}=1$, and it can be shown that $c_{t,d}\ge 2$ for all other $t$ and $d$. Yet current computationally efficient algorithms show only $c_{t,1}\le 2.25$ and the bound rises quickly to $c_{t,d}\le 3$ for $d\ge 9$. We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t,d}=2$ for all $(t,d)\ne(1,0)$. Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.}
}

Endnote

%0 Conference Paper
%T TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm
%A Yi Hao
%A Ayush Jain
%A Alon Orlitsky
%A Vaishakh Ravindrakumar
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-hao22a
%I PMLR
%P 8427--8445
%U https://proceedings.mlr.press/v162/hao22a.html
%V 162
%X Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $\ell_1$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d$ polynomial, where $t\ge1$ and $d\ge0$. Letting $c_{t,d}$ denote the smallest such factor, clearly $c_{1,0}=1$, and it can be shown that $c_{t,d}\ge 2$ for all other $t$ and $d$. Yet current computationally efficient algorithms show only $c_{t,1}\le 2.25$ and the bound rises quickly to $c_{t,d}\le 3$ for $d\ge 9$. We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t,d}=2$ for all $(t,d)\ne(1,0)$. Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

APA


Hao, Y., Jain, A., Orlitsky, A. & Ravindrakumar, V.. (2022). TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:8427-8445 Available from https://proceedings.mlr.press/v162/hao22a.html.

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Abstract

Cite this Paper

Related Material