That was fast! Speeding up NN search of high dimensional distributions.

Emanuele Coviello, Adeel Mumtaz, Antoni Chan, Gert Lanckriet
; Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):468-476, 2013.

Abstract

We present a data structure for fast nearest neighbor retrieval of generative models of documents based on KL divergence. Our data structure, which shares some similarity with Bregman Ball Trees, consists of a hierarchical partition of a database, and uses a novel branch and bound methodology for search. The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation. This reduces the number of computations per node, and overcomes the limitations of Bregman Ball Trees on high dimensional data. In addition, our strategy is applicable also to probability distributions with hidden state variables, and is not limited to regular exponential family distributions. Experiments demonstrate substantial speed-ups over both Bregman Ball Trees and over brute force search, on both moderate and high dimensional histogram data. In addition, experiments on linear dynamical systems demonstrate the flexibility of our approach to latent variable models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-coviello13, title = {That was fast! Speeding up NN search of high dimensional distributions.}, author = {Emanuele Coviello and Adeel Mumtaz and Antoni Chan and Gert Lanckriet}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {468--476}, year = {2013}, editor = {Sanjoy Dasgupta and David McAllester}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/coviello13.pdf}, url = {http://proceedings.mlr.press/v28/coviello13.html}, abstract = {We present a data structure for fast nearest neighbor retrieval of generative models of documents based on KL divergence. Our data structure, which shares some similarity with Bregman Ball Trees, consists of a hierarchical partition of a database, and uses a novel branch and bound methodology for search. The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation. This reduces the number of computations per node, and overcomes the limitations of Bregman Ball Trees on high dimensional data. In addition, our strategy is applicable also to probability distributions with hidden state variables, and is not limited to regular exponential family distributions. Experiments demonstrate substantial speed-ups over both Bregman Ball Trees and over brute force search, on both moderate and high dimensional histogram data. In addition, experiments on linear dynamical systems demonstrate the flexibility of our approach to latent variable models.} }
Endnote
%0 Conference Paper %T That was fast! Speeding up NN search of high dimensional distributions. %A Emanuele Coviello %A Adeel Mumtaz %A Antoni Chan %A Gert Lanckriet %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-coviello13 %I PMLR %J Proceedings of Machine Learning Research %P 468--476 %U http://proceedings.mlr.press %V 28 %N 3 %W PMLR %X We present a data structure for fast nearest neighbor retrieval of generative models of documents based on KL divergence. Our data structure, which shares some similarity with Bregman Ball Trees, consists of a hierarchical partition of a database, and uses a novel branch and bound methodology for search. The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation. This reduces the number of computations per node, and overcomes the limitations of Bregman Ball Trees on high dimensional data. In addition, our strategy is applicable also to probability distributions with hidden state variables, and is not limited to regular exponential family distributions. Experiments demonstrate substantial speed-ups over both Bregman Ball Trees and over brute force search, on both moderate and high dimensional histogram data. In addition, experiments on linear dynamical systems demonstrate the flexibility of our approach to latent variable models.
RIS
TY - CPAPER TI - That was fast! Speeding up NN search of high dimensional distributions. AU - Emanuele Coviello AU - Adeel Mumtaz AU - Antoni Chan AU - Gert Lanckriet BT - Proceedings of the 30th International Conference on Machine Learning PY - 2013/02/13 DA - 2013/02/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-coviello13 PB - PMLR SP - 468 DP - PMLR EP - 476 L1 - http://proceedings.mlr.press/v28/coviello13.pdf UR - http://proceedings.mlr.press/v28/coviello13.html AB - We present a data structure for fast nearest neighbor retrieval of generative models of documents based on KL divergence. Our data structure, which shares some similarity with Bregman Ball Trees, consists of a hierarchical partition of a database, and uses a novel branch and bound methodology for search. The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation. This reduces the number of computations per node, and overcomes the limitations of Bregman Ball Trees on high dimensional data. In addition, our strategy is applicable also to probability distributions with hidden state variables, and is not limited to regular exponential family distributions. Experiments demonstrate substantial speed-ups over both Bregman Ball Trees and over brute force search, on both moderate and high dimensional histogram data. In addition, experiments on linear dynamical systems demonstrate the flexibility of our approach to latent variable models. ER -
APA
Coviello, E., Mumtaz, A., Chan, A. & Lanckriet, G.. (2013). That was fast! Speeding up NN search of high dimensional distributions.. Proceedings of the 30th International Conference on Machine Learning, in PMLR 28(3):468-476

Related Material