Global Ground Metric Learning with Applications to scRNA data

Damin Kühn, Michael T Schaub
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:3295-3303, 2025.

Abstract

Optimal transport (OT) provides a robust framework for comparing probability distributions. Its effectiveness is significantly influenced by the choice of the underlying ground metric. Traditionally, the ground metric has either been (i) predefined, e.g., as a Euclidean metric, or (ii) learned in a supervised way, by utilizing labeled data to learn a suitable ground metric for enhanced task-specific performance. Yet, predefined metrics typically cannot account for the inherent structure and varying significance of different features in the data, and existing supervised ground metric learning methods often fail to generalize across multiple classes or are limited to distributions with shared supports. To address this issue, this paper introduces a novel approach for learning metrics for arbitrary distributions over a shared metric space. Our method provides a distance between individual points (samples) like a global metric, but requires only class labels on a distribution-level for training. The resulting learned global ground metric enables more accurate OT distances, which can significantly improve clustering and classification tasks. Further, we can create task-specific shared embeddings for elements (samples) from different distributions, including unseen data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-kuhn25a, title = {Global Ground Metric Learning with Applications to scRNA data}, author = {K{\"u}hn, Damin and Schaub, Michael T}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {3295--3303}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/kuhn25a/kuhn25a.pdf}, url = {https://proceedings.mlr.press/v258/kuhn25a.html}, abstract = {Optimal transport (OT) provides a robust framework for comparing probability distributions. Its effectiveness is significantly influenced by the choice of the underlying ground metric. Traditionally, the ground metric has either been (i) predefined, e.g., as a Euclidean metric, or (ii) learned in a supervised way, by utilizing labeled data to learn a suitable ground metric for enhanced task-specific performance. Yet, predefined metrics typically cannot account for the inherent structure and varying significance of different features in the data, and existing supervised ground metric learning methods often fail to generalize across multiple classes or are limited to distributions with shared supports. To address this issue, this paper introduces a novel approach for learning metrics for arbitrary distributions over a shared metric space. Our method provides a distance between individual points (samples) like a global metric, but requires only class labels on a distribution-level for training. The resulting learned global ground metric enables more accurate OT distances, which can significantly improve clustering and classification tasks. Further, we can create task-specific shared embeddings for elements (samples) from different distributions, including unseen data.} }
Endnote
%0 Conference Paper %T Global Ground Metric Learning with Applications to scRNA data %A Damin Kühn %A Michael T Schaub %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-kuhn25a %I PMLR %P 3295--3303 %U https://proceedings.mlr.press/v258/kuhn25a.html %V 258 %X Optimal transport (OT) provides a robust framework for comparing probability distributions. Its effectiveness is significantly influenced by the choice of the underlying ground metric. Traditionally, the ground metric has either been (i) predefined, e.g., as a Euclidean metric, or (ii) learned in a supervised way, by utilizing labeled data to learn a suitable ground metric for enhanced task-specific performance. Yet, predefined metrics typically cannot account for the inherent structure and varying significance of different features in the data, and existing supervised ground metric learning methods often fail to generalize across multiple classes or are limited to distributions with shared supports. To address this issue, this paper introduces a novel approach for learning metrics for arbitrary distributions over a shared metric space. Our method provides a distance between individual points (samples) like a global metric, but requires only class labels on a distribution-level for training. The resulting learned global ground metric enables more accurate OT distances, which can significantly improve clustering and classification tasks. Further, we can create task-specific shared embeddings for elements (samples) from different distributions, including unseen data.
APA
Kühn, D. & Schaub, M.T.. (2025). Global Ground Metric Learning with Applications to scRNA data. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:3295-3303 Available from https://proceedings.mlr.press/v258/kuhn25a.html.

Related Material