Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2002-2011, 2015.
Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.