Unsupervised Object Learning via Common Fate

Matthias Tangemann, Steffen Schneider, Julius Von Kügelgen, Francesco Locatello, Peter Vincent Gehler, Thomas Brox, Matthias Kuemmerer, Matthias Bethge, Bernhard Schölkopf
Proceedings of the Second Conference on Causal Learning and Reasoning, PMLR 213:281-327, 2023.

Abstract

Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative models are trained on the masks of the background and the moving objects, respectively. Third, background and foreground models are combined in a conditional “dead leaves” scene model to sample novel scene configurations where occlusions and depth layering arise naturally. To evaluate the individual stages, we introduce the FISHBOWL dataset positioned between complex real-world scenes and common object-centric benchmarks of simplistic objects. We show that our approach learns generative models that generalize beyond occlusions present in the input videos and represents scenes in a modular fashion, allowing generation of plausible scenes outside the training distribution by permitting, for instance, object numbers or densities not observed during training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v213-tangemann23a, title = {Unsupervised Object Learning via Common Fate}, author = {Tangemann, Matthias and Schneider, Steffen and K\"ugelgen, Julius Von and Locatello, Francesco and Gehler, Peter Vincent and Brox, Thomas and Kuemmerer, Matthias and Bethge, Matthias and Sch\"olkopf, Bernhard}, booktitle = {Proceedings of the Second Conference on Causal Learning and Reasoning}, pages = {281--327}, year = {2023}, editor = {van der Schaar, Mihaela and Zhang, Cheng and Janzing, Dominik}, volume = {213}, series = {Proceedings of Machine Learning Research}, month = {11--14 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v213/tangemann23a/tangemann23a.pdf}, url = {https://proceedings.mlr.press/v213/tangemann23a.html}, abstract = {Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative models are trained on the masks of the background and the moving objects, respectively. Third, background and foreground models are combined in a conditional “dead leaves” scene model to sample novel scene configurations where occlusions and depth layering arise naturally. To evaluate the individual stages, we introduce the FISHBOWL dataset positioned between complex real-world scenes and common object-centric benchmarks of simplistic objects. We show that our approach learns generative models that generalize beyond occlusions present in the input videos and represents scenes in a modular fashion, allowing generation of plausible scenes outside the training distribution by permitting, for instance, object numbers or densities not observed during training.} }
Endnote
%0 Conference Paper %T Unsupervised Object Learning via Common Fate %A Matthias Tangemann %A Steffen Schneider %A Julius Von Kügelgen %A Francesco Locatello %A Peter Vincent Gehler %A Thomas Brox %A Matthias Kuemmerer %A Matthias Bethge %A Bernhard Schölkopf %B Proceedings of the Second Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2023 %E Mihaela van der Schaar %E Cheng Zhang %E Dominik Janzing %F pmlr-v213-tangemann23a %I PMLR %P 281--327 %U https://proceedings.mlr.press/v213/tangemann23a.html %V 213 %X Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative models are trained on the masks of the background and the moving objects, respectively. Third, background and foreground models are combined in a conditional “dead leaves” scene model to sample novel scene configurations where occlusions and depth layering arise naturally. To evaluate the individual stages, we introduce the FISHBOWL dataset positioned between complex real-world scenes and common object-centric benchmarks of simplistic objects. We show that our approach learns generative models that generalize beyond occlusions present in the input videos and represents scenes in a modular fashion, allowing generation of plausible scenes outside the training distribution by permitting, for instance, object numbers or densities not observed during training.
APA
Tangemann, M., Schneider, S., Kügelgen, J.V., Locatello, F., Gehler, P.V., Brox, T., Kuemmerer, M., Bethge, M. & Schölkopf, B.. (2023). Unsupervised Object Learning via Common Fate. Proceedings of the Second Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 213:281-327 Available from https://proceedings.mlr.press/v213/tangemann23a.html.

Related Material