Fast Dictionary Learning with a Smoothed Wasserstein Loss

Antoine Rolet; Marco Cuturi; Gabriel Peyré

Fast Dictionary Learning with a Smoothed Wasserstein Loss

Antoine Rolet, Marco Cuturi, Gabriel Peyré

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:630-638, 2016.

Abstract

We consider in this paper the dictionary learning problem when the observations are normalized histograms of features. This problem can be tackled using non-negative matrix factorization approaches, using typically Euclidean or Kullback-Leibler fitting errors. Because these fitting errors are separable and treat each feature on equal footing, they are blind to any similarity the features may share. We assume in this work that we have prior knowledge on these features. To leverage this side-information, we propose to use the Wasserstein (\textita.k.a. earth mover’s or optimal transport) distance as the fitting error between each original point and its reconstruction, and we propose scalable algorithms to to so. Our methods build upon Fenchel duality and entropic regularization of Wasserstein distances, which improves not only speed but also computational stability. We apply these techniques on face images and text documents. We show in particular that we can learn dictionaries (topics) for bag-of-word representations of texts using words that may not have appeared in the original texts, or even words that come from a different language than that used in the texts.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-rolet16,
  title = 	 {Fast Dictionary Learning with a Smoothed Wasserstein Loss},
  author = 	 {Rolet, Antoine and Cuturi, Marco and Peyré, Gabriel},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {630--638},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/rolet16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/rolet16.html},
  abstract = 	 {We consider in this paper the dictionary learning problem when the observations are normalized histograms of features. This problem can be tackled using non-negative matrix factorization approaches, using typically Euclidean or Kullback-Leibler fitting errors. Because these fitting errors are separable and treat each feature on equal footing, they are blind to any similarity the features may share. We assume in this work that we have prior knowledge on these features. To leverage this side-information, we propose to use the Wasserstein (\textita.k.a. earth mover’s or optimal transport) distance as the fitting error between each original point and its reconstruction, and we propose scalable algorithms to to so. Our methods build upon Fenchel duality and entropic regularization of Wasserstein distances, which improves not only speed but also computational stability. We apply these techniques on face images and text documents. We show in particular that we can learn dictionaries (topics) for bag-of-word representations of texts using words that may not have appeared in the original texts, or even words that come from a different language than that used in the texts.}
}

Endnote

%0 Conference Paper
%T Fast Dictionary Learning with a Smoothed Wasserstein Loss
%A Antoine Rolet
%A Marco Cuturi
%A Gabriel Peyré
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-rolet16
%I PMLR
%P 630--638
%U https://proceedings.mlr.press/v51/rolet16.html
%V 51
%X We consider in this paper the dictionary learning problem when the observations are normalized histograms of features. This problem can be tackled using non-negative matrix factorization approaches, using typically Euclidean or Kullback-Leibler fitting errors. Because these fitting errors are separable and treat each feature on equal footing, they are blind to any similarity the features may share. We assume in this work that we have prior knowledge on these features. To leverage this side-information, we propose to use the Wasserstein (\textita.k.a. earth mover’s or optimal transport) distance as the fitting error between each original point and its reconstruction, and we propose scalable algorithms to to so. Our methods build upon Fenchel duality and entropic regularization of Wasserstein distances, which improves not only speed but also computational stability. We apply these techniques on face images and text documents. We show in particular that we can learn dictionaries (topics) for bag-of-word representations of texts using words that may not have appeared in the original texts, or even words that come from a different language than that used in the texts.

RIS


TY  - CPAPER
TI  - Fast Dictionary Learning with a Smoothed Wasserstein Loss
AU  - Antoine Rolet
AU  - Marco Cuturi
AU  - Gabriel Peyré
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-rolet16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 630
EP  - 638
L1  - http://proceedings.mlr.press/v51/rolet16.pdf
UR  - https://proceedings.mlr.press/v51/rolet16.html
AB  - We consider in this paper the dictionary learning problem when the observations are normalized histograms of features. This problem can be tackled using non-negative matrix factorization approaches, using typically Euclidean or Kullback-Leibler fitting errors. Because these fitting errors are separable and treat each feature on equal footing, they are blind to any similarity the features may share. We assume in this work that we have prior knowledge on these features. To leverage this side-information, we propose to use the Wasserstein (\textita.k.a. earth mover’s or optimal transport) distance as the fitting error between each original point and its reconstruction, and we propose scalable algorithms to to so. Our methods build upon Fenchel duality and entropic regularization of Wasserstein distances, which improves not only speed but also computational stability. We apply these techniques on face images and text documents. We show in particular that we can learn dictionaries (topics) for bag-of-word representations of texts using words that may not have appeared in the original texts, or even words that come from a different language than that used in the texts.
ER  -

APA


Rolet, A., Cuturi, M. & Peyré, G.. (2016). Fast Dictionary Learning with a Smoothed Wasserstein Loss. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:630-638 Available from https://proceedings.mlr.press/v51/rolet16.html.

Fast Dictionary Learning with a Smoothed Wasserstein Loss

Abstract

Cite this Paper

Related Material