Multi-Object Representation Learning with Iterative Variational Inference

Klaus Greff, Raphaël Lopez Kaufman, Rishabh Kabra, Nick Watters, Christopher Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2424-2433, 2019.

Abstract

Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns – without supervision – to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-greff19a, title = {Multi-Object Representation Learning with Iterative Variational Inference}, author = {Greff, Klaus and Kaufman, Rapha{\"e}l Lopez and Kabra, Rishabh and Watters, Nick and Burgess, Christopher and Zoran, Daniel and Matthey, Loic and Botvinick, Matthew and Lerchner, Alexander}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2424--2433}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/greff19a/greff19a.pdf}, url = {https://proceedings.mlr.press/v97/greff19a.html}, abstract = {Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns – without supervision – to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.} }
Endnote
%0 Conference Paper %T Multi-Object Representation Learning with Iterative Variational Inference %A Klaus Greff %A Raphaël Lopez Kaufman %A Rishabh Kabra %A Nick Watters %A Christopher Burgess %A Daniel Zoran %A Loic Matthey %A Matthew Botvinick %A Alexander Lerchner %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-greff19a %I PMLR %P 2424--2433 %U https://proceedings.mlr.press/v97/greff19a.html %V 97 %X Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns – without supervision – to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.
APA
Greff, K., Kaufman, R.L., Kabra, R., Watters, N., Burgess, C., Zoran, D., Matthey, L., Botvinick, M. & Lerchner, A.. (2019). Multi-Object Representation Learning with Iterative Variational Inference. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2424-2433 Available from https://proceedings.mlr.press/v97/greff19a.html.

Related Material