Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations

Tam Le; Marco Cuturi

Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations

Tam Le, Marco Cuturi

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2002-2011, 2015.

Abstract

Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-le15,
  title = 	 {Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations},
  author = 	 {Le, Tam and Cuturi, Marco},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {2002--2011},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/le15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/le15.html},
  abstract = 	 {Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.}
}

Endnote

%0 Conference Paper
%T Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations
%A Tam Le
%A Marco Cuturi
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-le15
%I PMLR
%P 2002--2011
%U https://proceedings.mlr.press/v37/le15.html
%V 37
%X Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.

RIS


TY  - CPAPER
TI  - Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations
AU  - Tam Le
AU  - Marco Cuturi
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-le15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 2002
EP  - 2011
L1  - http://proceedings.mlr.press/v37/le15.pdf
UR  - https://proceedings.mlr.press/v37/le15.html
AB  - Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.
ER  -

APA


Le, T. & Cuturi, M.. (2015). Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2002-2011 Available from https://proceedings.mlr.press/v37/le15.html.

Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations

Abstract

Cite this Paper

Related Material