Provably Learning Object-Centric Representations

Jack Brady, Roland S. Zimmermann, Yash Sharma, Bernhard Schölkopf, Julius Von Kügelgen, Wieland Brendel
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:3038-3062, 2023.

Abstract

Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical progress, a theoretical account of when unsupervised object-centric representation learning is possible is still lacking. Consequently, understanding the reasons for the success of existing object-centric methods as well as designing new theoretically grounded methods remains challenging. In the present work, we analyze when object-centric representations can provably be learned without supervision. To this end, we first introduce two assumptions on the generative process for scenes comprised of several objects, which we call compositionality and irreducibility. Under this generative process, we prove that the ground-truth object representations can be identified by an invertible and compositional inference model, even in the presence of dependencies between objects. We empirically validate our results through experiments on synthetic data. Finally, we provide evidence that our theory holds predictive power for existing object-centric models by showing a close correspondence between models’ compositionality and invertibility and their empirical identifiability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-brady23a, title = {Provably Learning Object-Centric Representations}, author = {Brady, Jack and Zimmermann, Roland S. and Sharma, Yash and Sch\"{o}lkopf, Bernhard and Von K\"{u}gelgen, Julius and Brendel, Wieland}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {3038--3062}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/brady23a/brady23a.pdf}, url = {https://proceedings.mlr.press/v202/brady23a.html}, abstract = {Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical progress, a theoretical account of when unsupervised object-centric representation learning is possible is still lacking. Consequently, understanding the reasons for the success of existing object-centric methods as well as designing new theoretically grounded methods remains challenging. In the present work, we analyze when object-centric representations can provably be learned without supervision. To this end, we first introduce two assumptions on the generative process for scenes comprised of several objects, which we call compositionality and irreducibility. Under this generative process, we prove that the ground-truth object representations can be identified by an invertible and compositional inference model, even in the presence of dependencies between objects. We empirically validate our results through experiments on synthetic data. Finally, we provide evidence that our theory holds predictive power for existing object-centric models by showing a close correspondence between models’ compositionality and invertibility and their empirical identifiability.} }
Endnote
%0 Conference Paper %T Provably Learning Object-Centric Representations %A Jack Brady %A Roland S. Zimmermann %A Yash Sharma %A Bernhard Schölkopf %A Julius Von Kügelgen %A Wieland Brendel %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-brady23a %I PMLR %P 3038--3062 %U https://proceedings.mlr.press/v202/brady23a.html %V 202 %X Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical progress, a theoretical account of when unsupervised object-centric representation learning is possible is still lacking. Consequently, understanding the reasons for the success of existing object-centric methods as well as designing new theoretically grounded methods remains challenging. In the present work, we analyze when object-centric representations can provably be learned without supervision. To this end, we first introduce two assumptions on the generative process for scenes comprised of several objects, which we call compositionality and irreducibility. Under this generative process, we prove that the ground-truth object representations can be identified by an invertible and compositional inference model, even in the presence of dependencies between objects. We empirically validate our results through experiments on synthetic data. Finally, we provide evidence that our theory holds predictive power for existing object-centric models by showing a close correspondence between models’ compositionality and invertibility and their empirical identifiability.
APA
Brady, J., Zimmermann, R.S., Sharma, Y., Schölkopf, B., Von Kügelgen, J. & Brendel, W.. (2023). Provably Learning Object-Centric Representations. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:3038-3062 Available from https://proceedings.mlr.press/v202/brady23a.html.

Related Material