Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations

Tam Le, Marco Cuturi
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2002-2011, 2015.

Abstract

Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-le15, title = {Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations}, author = {Le, Tam and Cuturi, Marco}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {2002--2011}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/le15.pdf}, url = {https://proceedings.mlr.press/v37/le15.html}, abstract = {Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.} }
Endnote
%0 Conference Paper %T Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations %A Tam Le %A Marco Cuturi %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-le15 %I PMLR %P 2002--2011 %U https://proceedings.mlr.press/v37/le15.html %V 37 %X Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.
RIS
TY - CPAPER TI - Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations AU - Tam Le AU - Marco Cuturi BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-le15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 2002 EP - 2011 L1 - http://proceedings.mlr.press/v37/le15.pdf UR - https://proceedings.mlr.press/v37/le15.html AB - Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches. ER -
APA
Le, T. & Cuturi, M.. (2015). Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2002-2011 Available from https://proceedings.mlr.press/v37/le15.html.

Related Material