Learning Sparse Metrics, One Feature at a Time

Yuval Atzmon; Uri Shalit; Gal Chechik

Learning Sparse Metrics, One Feature at a Time

Yuval Atzmon, Uri Shalit, Gal Chechik

Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, PMLR 44:30-48, 2015.

Abstract

Learning distance metrics from data is a fundamental problem in machine learning and useful way to extract data-driven features by using the matrix root of a distance matrix. Finding a proper metric amounts to optimization over the cone of positive definite (PD) matrices. This optimization is difficult since restricting optimization to remain within the PD cone or repeatedly projecting to the cone is prohibitively costly. Here we describe COMET, a block-coordinate descent procedure, which efficiently keeps the search within the PD cone, avoiding both costly projections and unnecessary computation of full gradients. COMET also continuously maintains the Cholesky root of the matrix, providing feature extraction and embedding of samples in a metric space. We further develop a structurally sparse variant of COMET, where only a small number of features interacts with other features. Sparse-COMET significantly accelerates both training and inference while improving interpretability. As a block-coordinate descent procedure, COMET has fast convergence bounds showing linear convergence with high probability. When tested on benchmark datasets in a task of retrieving similar images and similar text documents, COMET has significantly better precision than competing projection-free methods. Furthermore, sparse-COMET achieves almost identical precision as dense-COMET in document classification, while running 4.5 times faster, maintaining a 0.5% sparsity level, and outperforming competing methods both in precision and in run time.

Cite this Paper

BibTeX


@InProceedings{pmlr-v44-atzmon2015,
  title = 	 {Learning Sparse Metrics, One Feature at a Time},
  author = 	 {Atzmon, Yuval and Shalit, Uri and Chechik, Gal},
  booktitle = 	 {Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015},
  pages = 	 {30--48},
  year = 	 {2015},
  editor = 	 {Storcheus, Dmitry and Rostamizadeh, Afshin and Kumar, Sanjiv},
  volume = 	 {44},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Montreal, Canada},
  month = 	 {11 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v44/atzmon2015.pdf},
  url = 	 {https://proceedings.mlr.press/v44/atzmon2015.html},
  abstract = 	 {Learning distance metrics from data is a fundamental problem in machine learning and useful way to extract data-driven features by using the matrix root of a distance matrix. Finding a proper metric amounts to optimization over the cone of positive definite (PD) matrices. This optimization is difficult since restricting optimization to remain within the PD cone or repeatedly projecting to the cone is prohibitively costly. Here we describe COMET, a block-coordinate descent procedure, which efficiently keeps the search within the PD cone, avoiding both costly projections and unnecessary computation of full gradients. COMET also continuously maintains the Cholesky root of the matrix, providing feature extraction and embedding of samples in a metric space. We further develop a structurally sparse variant of COMET, where only a small number of features interacts with other features. Sparse-COMET significantly accelerates both training and inference while improving interpretability. As a block-coordinate descent procedure, COMET has fast convergence bounds showing linear convergence with high probability. When tested on benchmark datasets in a task of retrieving similar images and similar text documents, COMET has significantly better precision than competing projection-free methods. Furthermore, sparse-COMET achieves almost identical precision as dense-COMET in document classification, while running 4.5 times faster, maintaining a 0.5% sparsity level, and outperforming competing methods both in precision and in run time. }
}

Endnote

%0 Conference Paper
%T Learning Sparse Metrics, One Feature at a Time
%A Yuval Atzmon
%A Uri Shalit
%A Gal Chechik
%B Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015
%C Proceedings of Machine Learning Research
%D 2015
%E Dmitry Storcheus
%E Afshin Rostamizadeh
%E Sanjiv Kumar	
%F pmlr-v44-atzmon2015
%I PMLR
%P 30--48
%U https://proceedings.mlr.press/v44/atzmon2015.html
%V 44
%X Learning distance metrics from data is a fundamental problem in machine learning and useful way to extract data-driven features by using the matrix root of a distance matrix. Finding a proper metric amounts to optimization over the cone of positive definite (PD) matrices. This optimization is difficult since restricting optimization to remain within the PD cone or repeatedly projecting to the cone is prohibitively costly. Here we describe COMET, a block-coordinate descent procedure, which efficiently keeps the search within the PD cone, avoiding both costly projections and unnecessary computation of full gradients. COMET also continuously maintains the Cholesky root of the matrix, providing feature extraction and embedding of samples in a metric space. We further develop a structurally sparse variant of COMET, where only a small number of features interacts with other features. Sparse-COMET significantly accelerates both training and inference while improving interpretability. As a block-coordinate descent procedure, COMET has fast convergence bounds showing linear convergence with high probability. When tested on benchmark datasets in a task of retrieving similar images and similar text documents, COMET has significantly better precision than competing projection-free methods. Furthermore, sparse-COMET achieves almost identical precision as dense-COMET in document classification, while running 4.5 times faster, maintaining a 0.5% sparsity level, and outperforming competing methods both in precision and in run time.

RIS


TY  - CPAPER
TI  - Learning Sparse Metrics, One Feature at a Time
AU  - Yuval Atzmon
AU  - Uri Shalit
AU  - Gal Chechik
BT  - Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015
DA  - 2015/12/08
ED  - Dmitry Storcheus
ED  - Afshin Rostamizadeh
ED  - Sanjiv Kumar	
ID  - pmlr-v44-atzmon2015
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 44
SP  - 30
EP  - 48
L1  - http://proceedings.mlr.press/v44/atzmon2015.pdf
UR  - https://proceedings.mlr.press/v44/atzmon2015.html
AB  - Learning distance metrics from data is a fundamental problem in machine learning and useful way to extract data-driven features by using the matrix root of a distance matrix. Finding a proper metric amounts to optimization over the cone of positive definite (PD) matrices. This optimization is difficult since restricting optimization to remain within the PD cone or repeatedly projecting to the cone is prohibitively costly. Here we describe COMET, a block-coordinate descent procedure, which efficiently keeps the search within the PD cone, avoiding both costly projections and unnecessary computation of full gradients. COMET also continuously maintains the Cholesky root of the matrix, providing feature extraction and embedding of samples in a metric space. We further develop a structurally sparse variant of COMET, where only a small number of features interacts with other features. Sparse-COMET significantly accelerates both training and inference while improving interpretability. As a block-coordinate descent procedure, COMET has fast convergence bounds showing linear convergence with high probability. When tested on benchmark datasets in a task of retrieving similar images and similar text documents, COMET has significantly better precision than competing projection-free methods. Furthermore, sparse-COMET achieves almost identical precision as dense-COMET in document classification, while running 4.5 times faster, maintaining a 0.5% sparsity level, and outperforming competing methods both in precision and in run time. 
ER  -

APA


Atzmon, Y., Shalit, U. & Chechik, G.. (2015). Learning Sparse Metrics, One Feature at a Time. Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, in Proceedings of Machine Learning Research 44:30-48 Available from https://proceedings.mlr.press/v44/atzmon2015.html.

Related Material

Download PDF