Medoids in Almost-Linear Time via Multi-Armed Bandits
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:500-509, 2018.
Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.