Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

Ilias Diakonikolas; Jerry Li; Ludwig Schmidt

Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

Ilias Diakonikolas, Jerry Li, Ludwig Schmidt

Proceedings of the 31st Conference On Learning Theory, PMLR 75:819-842, 2018.

Abstract

We study the problem of robustly learning multi-dimensional histograms. A $d$-dimensional function $h: D \to \R$ is called a $k$-histogram if there exists a partition of the domain $D \subseteq \R^d$ into $k$ axis-aligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \to \R$ be a $d$-dimensional probability density function and suppose that $f$ is $\mathrm{OPT}$-close, in $L_1$-distance, to an unknown $k$-histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. We give an algorithm for this learning problem that uses $n = \tilde{O}_d(k/\eps^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the $d=1$ case was well-understood, but significant gaps in our understanding remained even for $d=2$.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-diakonikolas18a,
  title = 	 {Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms},
  author =       {Diakonikolas, Ilias and Li, Jerry and Schmidt, Ludwig},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {819--842},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/diakonikolas18a/diakonikolas18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/diakonikolas18a.html},
  abstract = 	 {We study the problem of robustly learning multi-dimensional histograms.  A $d$-dimensional function $h: D \to \R$ is called a $k$-histogram if there exists a partition of the  domain $D \subseteq \R^d$ into $k$ axis-aligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \to \R$ be a $d$-dimensional probability density function  and suppose that $f$ is $\mathrm{OPT}$-close, in $L_1$-distance,  to an unknown $k$-histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. We give an algorithm for this learning  problem that uses  $n = \tilde{O}_d(k/\eps^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the $d=1$ case was well-understood,  but significant gaps in our understanding remained even for $d=2$.}
}

Endnote

%0 Conference Paper
%T Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms
%A Ilias Diakonikolas
%A Jerry Li
%A Ludwig Schmidt
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-diakonikolas18a
%I PMLR
%P 819--842
%U https://proceedings.mlr.press/v75/diakonikolas18a.html
%V 75
%X We study the problem of robustly learning multi-dimensional histograms.  A $d$-dimensional function $h: D \to \R$ is called a $k$-histogram if there exists a partition of the  domain $D \subseteq \R^d$ into $k$ axis-aligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \to \R$ be a $d$-dimensional probability density function  and suppose that $f$ is $\mathrm{OPT}$-close, in $L_1$-distance,  to an unknown $k$-histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. We give an algorithm for this learning  problem that uses  $n = \tilde{O}_d(k/\eps^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the $d=1$ case was well-understood,  but significant gaps in our understanding remained even for $d=2$.

APA


Diakonikolas, I., Li, J. & Schmidt, L.. (2018). Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:819-842 Available from https://proceedings.mlr.press/v75/diakonikolas18a.html.

Related Material

Download PDF