Scalable Hash-Based Estimation of Divergence Measures

Morteza Noshad, Alfred Hero
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1877-1885, 2018.

Abstract

We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables $X$ and $Y$ whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We derive the MSE rates for two families of smooth functions; the Hölder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order $d$, where $d$ is the dimension of samples, the optimal parametric MSE rate of $O(1/N)$ can be achieved. The computational complexity is shown to be $O(N)$, which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and can achieve the optimal parametric MSE estimation rate of $O(1/N)$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-noshad18a, title = {Scalable Hash-Based Estimation of Divergence Measures}, author = {Noshad, Morteza and Hero, Alfred}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1877--1885}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/noshad18a/noshad18a.pdf}, url = {https://proceedings.mlr.press/v84/noshad18a.html}, abstract = {We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables $X$ and $Y$ whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We derive the MSE rates for two families of smooth functions; the Hölder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order $d$, where $d$ is the dimension of samples, the optimal parametric MSE rate of $O(1/N)$ can be achieved. The computational complexity is shown to be $O(N)$, which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and can achieve the optimal parametric MSE estimation rate of $O(1/N)$.} }
Endnote
%0 Conference Paper %T Scalable Hash-Based Estimation of Divergence Measures %A Morteza Noshad %A Alfred Hero %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-noshad18a %I PMLR %P 1877--1885 %U https://proceedings.mlr.press/v84/noshad18a.html %V 84 %X We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables $X$ and $Y$ whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We derive the MSE rates for two families of smooth functions; the Hölder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order $d$, where $d$ is the dimension of samples, the optimal parametric MSE rate of $O(1/N)$ can be achieved. The computational complexity is shown to be $O(N)$, which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and can achieve the optimal parametric MSE estimation rate of $O(1/N)$.
APA
Noshad, M. & Hero, A.. (2018). Scalable Hash-Based Estimation of Divergence Measures. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1877-1885 Available from https://proceedings.mlr.press/v84/noshad18a.html.

Related Material