Fast and Sample NearOptimal Algorithms for Learning Multidimensional Histograms
[edit]
Proceedings of the 31st Conference On Learning Theory, PMLR 75:819842, 2018.
Abstract
We study the problem of robustly learning multidimensional histograms. A $d$dimensional function $h: D \to \R$ is called a $k$histogram if there exists a partition of the domain $D \subseteq \R^d$ into $k$ axisaligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \to \R$ be a $d$dimensional probability density function and suppose that $f$ is $\mathrm{OPT}$close, in $L_1$distance, to an unknown $k$histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$distance. We give an algorithm for this learning problem that uses $n = \tilde{O}_d(k/\eps^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in nearlinear time. Prior to our work, the time complexity of the $d=1$ case was wellunderstood, but significant gaps in our understanding remained even for $d=2$.
Related Material


