Medoids in Almost-Linear Time via Multi-Armed Bandits

Vivek Bagaria; Govinda Kamath; Vasilis Ntranos; Martin Zhang; David Tse

Medoids in Almost-Linear Time via Multi-Armed Bandits

Vivek Bagaria, Govinda Kamath, Vasilis Ntranos, Martin Zhang, David Tse

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:500-509, 2018.

Abstract

Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.

Cite this Paper

BibTeX


@InProceedings{pmlr-v84-bagaria18a,
  title = 	 {Medoids in Almost-Linear Time via Multi-Armed Bandits},
  author = 	 {Bagaria, Vivek and Kamath, Govinda and Ntranos, Vasilis and Zhang, Martin and Tse, David},
  booktitle = 	 {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
  pages = 	 {500--509},
  year = 	 {2018},
  editor = 	 {Storkey, Amos and Perez-Cruz, Fernando},
  volume = 	 {84},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v84/bagaria18a/bagaria18a.pdf},
  url = 	 {https://proceedings.mlr.press/v84/bagaria18a.html},
  abstract = 	 {Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.}
}

Endnote

%0 Conference Paper
%T Medoids in Almost-Linear Time via Multi-Armed Bandits
%A Vivek Bagaria
%A Govinda Kamath
%A Vasilis Ntranos
%A Martin Zhang
%A David Tse
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz	
%F pmlr-v84-bagaria18a
%I PMLR
%P 500--509
%U https://proceedings.mlr.press/v84/bagaria18a.html
%V 84
%X Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.

APA


Bagaria, V., Kamath, G., Ntranos, V., Zhang, M. & Tse, D.. (2018). Medoids in Almost-Linear Time via Multi-Armed Bandits. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:500-509 Available from https://proceedings.mlr.press/v84/bagaria18a.html.

Medoids in Almost-Linear Time via Multi-Armed Bandits

Abstract

Cite this Paper

Related Material