[edit]
Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms
Proceedings of the 31st Conference On Learning Theory, PMLR 75:819-842, 2018.
Abstract
We study the problem of robustly learning multi-dimensional histograms. A d-dimensional function h:D→\R is called a k-histogram if there exists a partition of the domain D⊆\Rd into k axis-aligned rectangles such that h is constant within each such rectangle. Let f:D→\R be a d-dimensional probability density function and suppose that f is OPT-close, in L1-distance, to an unknown k-histogram (with unknown partition). Our goal is to output a hypothesis that is O(OPT)+ϵ close to f, in L1-distance. We give an algorithm for this learning problem that uses n=˜Od(k/\eps2) samples and runs in time ˜Od(n). For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the d=1 case was well-understood, but significant gaps in our understanding remained even for d=2.