Volume-Aware Distance for Robust Similarity Learning

Shuo Chen, Chen Gong, Jun Li, Jian Yang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:8123-8144, 2025.

Abstract

Measuring the similarity between data points plays a vital role in lots of popular representation learning tasks such as metric learning and contrastive learning. Most existing approaches utilize point-level distances to learn the point-to-point similarity between pairwise instances. However, since the finite number of training data points cannot fully cover the whole sample space consisting of an infinite number of points, the generalizability of the learned distance is usually limited by the sample size. In this paper, we thus extend the conventional form of data point to the new form of data ball with a predictable volume, so that we can naturally generalize the existing point-level distance to a new volume-aware distance (VAD) which measures the field-to-field geometric similarity. The learned VAD not only takes into account the relationship between observed instances but also uncovers the similarity among those unsampled neighbors surrounding the training data. This practice significantly enriches the coverage of sample space and thus improves the model generalizability. Theoretically, we prove that VAD tightens the error bound of traditional similarity learning and preserves crucial topological properties. Experiments on multi-domain data demonstrate the superiority of VAD over existing approaches in both supervised and unsupervised tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-chen25u, title = {Volume-Aware Distance for Robust Similarity Learning}, author = {Chen, Shuo and Gong, Chen and Li, Jun and Yang, Jian}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {8123--8144}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/chen25u/chen25u.pdf}, url = {https://proceedings.mlr.press/v267/chen25u.html}, abstract = {Measuring the similarity between data points plays a vital role in lots of popular representation learning tasks such as metric learning and contrastive learning. Most existing approaches utilize point-level distances to learn the point-to-point similarity between pairwise instances. However, since the finite number of training data points cannot fully cover the whole sample space consisting of an infinite number of points, the generalizability of the learned distance is usually limited by the sample size. In this paper, we thus extend the conventional form of data point to the new form of data ball with a predictable volume, so that we can naturally generalize the existing point-level distance to a new volume-aware distance (VAD) which measures the field-to-field geometric similarity. The learned VAD not only takes into account the relationship between observed instances but also uncovers the similarity among those unsampled neighbors surrounding the training data. This practice significantly enriches the coverage of sample space and thus improves the model generalizability. Theoretically, we prove that VAD tightens the error bound of traditional similarity learning and preserves crucial topological properties. Experiments on multi-domain data demonstrate the superiority of VAD over existing approaches in both supervised and unsupervised tasks.} }
Endnote
%0 Conference Paper %T Volume-Aware Distance for Robust Similarity Learning %A Shuo Chen %A Chen Gong %A Jun Li %A Jian Yang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-chen25u %I PMLR %P 8123--8144 %U https://proceedings.mlr.press/v267/chen25u.html %V 267 %X Measuring the similarity between data points plays a vital role in lots of popular representation learning tasks such as metric learning and contrastive learning. Most existing approaches utilize point-level distances to learn the point-to-point similarity between pairwise instances. However, since the finite number of training data points cannot fully cover the whole sample space consisting of an infinite number of points, the generalizability of the learned distance is usually limited by the sample size. In this paper, we thus extend the conventional form of data point to the new form of data ball with a predictable volume, so that we can naturally generalize the existing point-level distance to a new volume-aware distance (VAD) which measures the field-to-field geometric similarity. The learned VAD not only takes into account the relationship between observed instances but also uncovers the similarity among those unsampled neighbors surrounding the training data. This practice significantly enriches the coverage of sample space and thus improves the model generalizability. Theoretically, we prove that VAD tightens the error bound of traditional similarity learning and preserves crucial topological properties. Experiments on multi-domain data demonstrate the superiority of VAD over existing approaches in both supervised and unsupervised tasks.
APA
Chen, S., Gong, C., Li, J. & Yang, J.. (2025). Volume-Aware Distance for Robust Similarity Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:8123-8144 Available from https://proceedings.mlr.press/v267/chen25u.html.

Related Material