Sparse Randomized Partition Trees for Nearest Neighbor Search
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:681-689, 2017.
Randomized partition trees have recently been shown to be very effective in solving nearest neighbor search problem. In spite of enjoying strong theoretical guarantee, it suffers from high space complexity, since each internal node of the tree needs to store a $d$ dimensional projection direction leading to a $O(nd)$ space complexity for a dataset of size $n$. Inspired by the fast Johnson-Lindenstrauss transform, in this paper, we propose a sparse version of randomized partition tree where each internal node needs to store only a few non-zero entries, as opposed to all $d$ entries, leading to significant space savings without sacrificing much in terms of nearest neighbor search accuracy. As a by product of this, query time of our proposed method is slightly better than that of its non-sparse counterpart for large dataset size. Our theoretical results indicate that our proposed method enjoys the same theoretical guarantee as that of the original non-sparse RP-tree. Experimental evaluations on four real world dataset strongly suggest that nearest neighbor search performance of our proposed sparse RP-tree is very similar to that of its non-sparse counterpart in terms of accuracy and number of retrieved points.