Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets

Tal Shnitzer, Mikhail Yurochkin, Kristjan Greenewald, Justin M Solomon
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20106-20124, 2022.

Abstract

The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-shnitzer22a, title = {Log-{E}uclidean Signatures for Intrinsic Distances Between Unaligned Datasets}, author = {Shnitzer, Tal and Yurochkin, Mikhail and Greenewald, Kristjan and Solomon, Justin M}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {20106--20124}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/shnitzer22a/shnitzer22a.pdf}, url = {https://proceedings.mlr.press/v162/shnitzer22a.html}, abstract = {The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.} }
Endnote
%0 Conference Paper %T Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets %A Tal Shnitzer %A Mikhail Yurochkin %A Kristjan Greenewald %A Justin M Solomon %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-shnitzer22a %I PMLR %P 20106--20124 %U https://proceedings.mlr.press/v162/shnitzer22a.html %V 162 %X The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.
APA
Shnitzer, T., Yurochkin, M., Greenewald, K. & Solomon, J.M.. (2022). Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20106-20124 Available from https://proceedings.mlr.press/v162/shnitzer22a.html.

Related Material