Fast 1-Wasserstein distance approximations using greedy strategies

Guillaume Houry, Han Bao, Han Zhao, Makoto Yamada
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:325-333, 2024.

Abstract

Among numerous linear approximation methods proposed for optimal transport (OT), tree-based methods appear to be fairly reliable, notably for language processing applications. Inspired by these tree methods, we introduce several greedy heuristics aiming to compute even faster approximations of OT. We first explicitly establish the equivalence between greedy matching and optimal transport for tree metrics, and then we show that tree greedy matching can be reduced to greedy matching on a one-dimensional line. Next, we propose two new greedy-based algorithms in one dimension: the $k$-Greedy and 1D-ICT algorithms. This novel approach provides Wasserstein approximations with accuracy similar to the original tree methods on text datasets while being faster in practice. Finally, these algorithms are applicable beyond tree approximations: using sliced projections of the original data still provides fairly good accuracy while eliminating the need for embedding the data in a fixed and rigid tree structure. This property makes these approaches even more versatile than the original tree OT methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-houry24a, title = {Fast 1-{W}asserstein distance approximations using greedy strategies}, author = {Houry, Guillaume and Bao, Han and Zhao, Han and Yamada, Makoto}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {325--333}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/houry24a/houry24a.pdf}, url = {https://proceedings.mlr.press/v238/houry24a.html}, abstract = {Among numerous linear approximation methods proposed for optimal transport (OT), tree-based methods appear to be fairly reliable, notably for language processing applications. Inspired by these tree methods, we introduce several greedy heuristics aiming to compute even faster approximations of OT. We first explicitly establish the equivalence between greedy matching and optimal transport for tree metrics, and then we show that tree greedy matching can be reduced to greedy matching on a one-dimensional line. Next, we propose two new greedy-based algorithms in one dimension: the $k$-Greedy and 1D-ICT algorithms. This novel approach provides Wasserstein approximations with accuracy similar to the original tree methods on text datasets while being faster in practice. Finally, these algorithms are applicable beyond tree approximations: using sliced projections of the original data still provides fairly good accuracy while eliminating the need for embedding the data in a fixed and rigid tree structure. This property makes these approaches even more versatile than the original tree OT methods.} }
Endnote
%0 Conference Paper %T Fast 1-Wasserstein distance approximations using greedy strategies %A Guillaume Houry %A Han Bao %A Han Zhao %A Makoto Yamada %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-houry24a %I PMLR %P 325--333 %U https://proceedings.mlr.press/v238/houry24a.html %V 238 %X Among numerous linear approximation methods proposed for optimal transport (OT), tree-based methods appear to be fairly reliable, notably for language processing applications. Inspired by these tree methods, we introduce several greedy heuristics aiming to compute even faster approximations of OT. We first explicitly establish the equivalence between greedy matching and optimal transport for tree metrics, and then we show that tree greedy matching can be reduced to greedy matching on a one-dimensional line. Next, we propose two new greedy-based algorithms in one dimension: the $k$-Greedy and 1D-ICT algorithms. This novel approach provides Wasserstein approximations with accuracy similar to the original tree methods on text datasets while being faster in practice. Finally, these algorithms are applicable beyond tree approximations: using sliced projections of the original data still provides fairly good accuracy while eliminating the need for embedding the data in a fixed and rigid tree structure. This property makes these approaches even more versatile than the original tree OT methods.
APA
Houry, G., Bao, H., Zhao, H. & Yamada, M.. (2024). Fast 1-Wasserstein distance approximations using greedy strategies. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:325-333 Available from https://proceedings.mlr.press/v238/houry24a.html.

Related Material