Unlocking Slot Attention by Changing Optimal Transport Costs

Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:41931-41951, 2023.

Abstract

Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zhang23ba, title = {Unlocking Slot Attention by Changing Optimal Transport Costs}, author = {Zhang, Yan and Zhang, David W. and Lacoste-Julien, Simon and Burghouts, Gertjan J. and Snoek, Cees G. M.}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {41931--41951}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zhang23ba/zhang23ba.pdf}, url = {https://proceedings.mlr.press/v202/zhang23ba.html}, abstract = {Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.} }
Endnote
%0 Conference Paper %T Unlocking Slot Attention by Changing Optimal Transport Costs %A Yan Zhang %A David W. Zhang %A Simon Lacoste-Julien %A Gertjan J. Burghouts %A Cees G. M. Snoek %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zhang23ba %I PMLR %P 41931--41951 %U https://proceedings.mlr.press/v202/zhang23ba.html %V 202 %X Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.
APA
Zhang, Y., Zhang, D.W., Lacoste-Julien, S., Burghouts, G.J. & Snoek, C.G.M.. (2023). Unlocking Slot Attention by Changing Optimal Transport Costs. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:41931-41951 Available from https://proceedings.mlr.press/v202/zhang23ba.html.

Related Material