Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants

Nathaniel Lahn, Sharath Raghvendra, Emma Saarinen, Pouyan Shirzadian
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:32188-32205, 2025.

Abstract

The $p$-Wasserstein distance measures the cost of optimally transporting one distribution to another, where the cost of moving a unit mass from $a$ to $b$ is the $p^{th}$ power of the ground distance $\mathrm{d}(a,b)$ between them. Despite its strong theoretical properties, its use in practice – especially for $p \ge 2$ – is limited due to two key challenges: sensitivity to noise and a lack of scalable algorithms. We identify noise sensitivity as a key reason why some existing approximation algorithms for $p=1$ fail to generalize to $p \ge 2$ and then present new algorithms for approximating the $p$-Wasserstein distance and its variant. First, when $\mathrm{d}(\cdot,\cdot)$ is a metric, for any constant $p \ge 2$, we present a novel relative $O(\log n)$-approximation algorithm to compute the $p$-Wasserstein distance between any two discrete distributions of size $n$. The algorithm runs in $O(n^2 \log U\log \Delta\log n)$ time, where $\log U$ is the bit-length of the input probabilities and $\Delta$ is the ratio of the largest to the smallest pairwise distance. We use $p$ hierarchically well-separated trees to define a distance that approximates the $p$-Wasserstein cost within a factor of $O(\log n)$ and then present a simple primal-dual algorithm to compute the $p$-Wasserstein cost with respect to this distance. Second, due to the noise sensitivity of the $p$-Wasserstein distance, we show that existing combinatorial approaches require $\Omega(n^2/\delta^p)$ time to approximate the $p$-Wasserstein distance within an additive error of $\delta$. In contrast, we show that, for any arbitrary distance $\mathrm{d}(\cdot,\cdot)$, a recent noise-resistant variant of the $p$-Wasserstein distance, called the $p$-RPW distance, can be approximated in $O(n^2/\delta^3)$ time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lahn25a, title = {Scalable Approximation Algorithms for $p$-{W}asserstein Distance and Its Variants}, author = {Lahn, Nathaniel and Raghvendra, Sharath and Saarinen, Emma and Shirzadian, Pouyan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {32188--32205}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lahn25a/lahn25a.pdf}, url = {https://proceedings.mlr.press/v267/lahn25a.html}, abstract = {The $p$-Wasserstein distance measures the cost of optimally transporting one distribution to another, where the cost of moving a unit mass from $a$ to $b$ is the $p^{th}$ power of the ground distance $\mathrm{d}(a,b)$ between them. Despite its strong theoretical properties, its use in practice – especially for $p \ge 2$ – is limited due to two key challenges: sensitivity to noise and a lack of scalable algorithms. We identify noise sensitivity as a key reason why some existing approximation algorithms for $p=1$ fail to generalize to $p \ge 2$ and then present new algorithms for approximating the $p$-Wasserstein distance and its variant. First, when $\mathrm{d}(\cdot,\cdot)$ is a metric, for any constant $p \ge 2$, we present a novel relative $O(\log n)$-approximation algorithm to compute the $p$-Wasserstein distance between any two discrete distributions of size $n$. The algorithm runs in $O(n^2 \log U\log \Delta\log n)$ time, where $\log U$ is the bit-length of the input probabilities and $\Delta$ is the ratio of the largest to the smallest pairwise distance. We use $p$ hierarchically well-separated trees to define a distance that approximates the $p$-Wasserstein cost within a factor of $O(\log n)$ and then present a simple primal-dual algorithm to compute the $p$-Wasserstein cost with respect to this distance. Second, due to the noise sensitivity of the $p$-Wasserstein distance, we show that existing combinatorial approaches require $\Omega(n^2/\delta^p)$ time to approximate the $p$-Wasserstein distance within an additive error of $\delta$. In contrast, we show that, for any arbitrary distance $\mathrm{d}(\cdot,\cdot)$, a recent noise-resistant variant of the $p$-Wasserstein distance, called the $p$-RPW distance, can be approximated in $O(n^2/\delta^3)$ time.} }
Endnote
%0 Conference Paper %T Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants %A Nathaniel Lahn %A Sharath Raghvendra %A Emma Saarinen %A Pouyan Shirzadian %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lahn25a %I PMLR %P 32188--32205 %U https://proceedings.mlr.press/v267/lahn25a.html %V 267 %X The $p$-Wasserstein distance measures the cost of optimally transporting one distribution to another, where the cost of moving a unit mass from $a$ to $b$ is the $p^{th}$ power of the ground distance $\mathrm{d}(a,b)$ between them. Despite its strong theoretical properties, its use in practice – especially for $p \ge 2$ – is limited due to two key challenges: sensitivity to noise and a lack of scalable algorithms. We identify noise sensitivity as a key reason why some existing approximation algorithms for $p=1$ fail to generalize to $p \ge 2$ and then present new algorithms for approximating the $p$-Wasserstein distance and its variant. First, when $\mathrm{d}(\cdot,\cdot)$ is a metric, for any constant $p \ge 2$, we present a novel relative $O(\log n)$-approximation algorithm to compute the $p$-Wasserstein distance between any two discrete distributions of size $n$. The algorithm runs in $O(n^2 \log U\log \Delta\log n)$ time, where $\log U$ is the bit-length of the input probabilities and $\Delta$ is the ratio of the largest to the smallest pairwise distance. We use $p$ hierarchically well-separated trees to define a distance that approximates the $p$-Wasserstein cost within a factor of $O(\log n)$ and then present a simple primal-dual algorithm to compute the $p$-Wasserstein cost with respect to this distance. Second, due to the noise sensitivity of the $p$-Wasserstein distance, we show that existing combinatorial approaches require $\Omega(n^2/\delta^p)$ time to approximate the $p$-Wasserstein distance within an additive error of $\delta$. In contrast, we show that, for any arbitrary distance $\mathrm{d}(\cdot,\cdot)$, a recent noise-resistant variant of the $p$-Wasserstein distance, called the $p$-RPW distance, can be approximated in $O(n^2/\delta^3)$ time.
APA
Lahn, N., Raghvendra, S., Saarinen, E. & Shirzadian, P.. (2025). Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:32188-32205 Available from https://proceedings.mlr.press/v267/lahn25a.html.

Related Material