Medoids in Almost-Linear Time via Multi-Armed Bandits

Vivek Bagaria, Govinda Kamath, Vasilis Ntranos, Martin Zhang, David Tse
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:500-509, 2018.

Abstract

Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-bagaria18a, title = {Medoids in Almost-Linear Time via Multi-Armed Bandits}, author = {Bagaria, Vivek and Kamath, Govinda and Ntranos, Vasilis and Zhang, Martin and Tse, David}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {500--509}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/bagaria18a/bagaria18a.pdf}, url = {https://proceedings.mlr.press/v84/bagaria18a.html}, abstract = {Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.} }
Endnote
%0 Conference Paper %T Medoids in Almost-Linear Time via Multi-Armed Bandits %A Vivek Bagaria %A Govinda Kamath %A Vasilis Ntranos %A Martin Zhang %A David Tse %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-bagaria18a %I PMLR %P 500--509 %U https://proceedings.mlr.press/v84/bagaria18a.html %V 84 %X Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.
APA
Bagaria, V., Kamath, G., Ntranos, V., Zhang, M. & Tse, D.. (2018). Medoids in Almost-Linear Time via Multi-Armed Bandits. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:500-509 Available from https://proceedings.mlr.press/v84/bagaria18a.html.

Related Material