[edit]
TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8427-8445, 2022.
Abstract
Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an ℓ1 distance essentially at most a constant times larger than its closest t-piece degree-d polynomial, where t≥1 and d≥0. Letting ct,d denote the smallest such factor, clearly c1,0=1, and it can be shown that ct,d≥2 for all other t and d. Yet current computationally efficient algorithms show only ct,1≤2.25 and the bound rises quickly to ct,d≤3 for d≥9. We derive a near-linear-time and essentially sample-optimal estimator that establishes ct,d=2 for all (t,d)≠(1,0). Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.