COAT: Measuring Object Compositionality in Emergent Representations

Sirui Xie, Ari S Morcos, Song-Chun Zhu, Ramakrishna Vedantam
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:24388-24413, 2022.

Abstract

Learning representations that can decompose a multi-object scene into its constituent objects and recompose them flexibly is desirable for object-oriented reasoning and planning. Built upon object masks in the pixel space, existing metrics for objectness can only evaluate generative models with an object-specific “slot” structure. We propose to directly measure compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models. Our metric, COAT (Compositional Object Algebra Test), evaluates if a generic representation exhibits certain geometric properties that underpin object compositionality beyond what is already captured by the raw pixel space. Our experiments on the popular CLEVR (Johnson et.al., 2018) domain reveal that existing disentanglement-based generative models are not as compositional as one might expect, suggesting room for further modeling improvements. We hope our work allows for a unified evaluation of object-centric representations, spanning generative as well as discriminative, self-supervised models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-xie22b, title = {{COAT}: Measuring Object Compositionality in Emergent Representations}, author = {Xie, Sirui and Morcos, Ari S and Zhu, Song-Chun and Vedantam, Ramakrishna}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {24388--24413}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/xie22b/xie22b.pdf}, url = {https://proceedings.mlr.press/v162/xie22b.html}, abstract = {Learning representations that can decompose a multi-object scene into its constituent objects and recompose them flexibly is desirable for object-oriented reasoning and planning. Built upon object masks in the pixel space, existing metrics for objectness can only evaluate generative models with an object-specific “slot” structure. We propose to directly measure compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models. Our metric, COAT (Compositional Object Algebra Test), evaluates if a generic representation exhibits certain geometric properties that underpin object compositionality beyond what is already captured by the raw pixel space. Our experiments on the popular CLEVR (Johnson et.al., 2018) domain reveal that existing disentanglement-based generative models are not as compositional as one might expect, suggesting room for further modeling improvements. We hope our work allows for a unified evaluation of object-centric representations, spanning generative as well as discriminative, self-supervised models.} }
Endnote
%0 Conference Paper %T COAT: Measuring Object Compositionality in Emergent Representations %A Sirui Xie %A Ari S Morcos %A Song-Chun Zhu %A Ramakrishna Vedantam %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-xie22b %I PMLR %P 24388--24413 %U https://proceedings.mlr.press/v162/xie22b.html %V 162 %X Learning representations that can decompose a multi-object scene into its constituent objects and recompose them flexibly is desirable for object-oriented reasoning and planning. Built upon object masks in the pixel space, existing metrics for objectness can only evaluate generative models with an object-specific “slot” structure. We propose to directly measure compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models. Our metric, COAT (Compositional Object Algebra Test), evaluates if a generic representation exhibits certain geometric properties that underpin object compositionality beyond what is already captured by the raw pixel space. Our experiments on the popular CLEVR (Johnson et.al., 2018) domain reveal that existing disentanglement-based generative models are not as compositional as one might expect, suggesting room for further modeling improvements. We hope our work allows for a unified evaluation of object-centric representations, spanning generative as well as discriminative, self-supervised models.
APA
Xie, S., Morcos, A.S., Zhu, S. & Vedantam, R.. (2022). COAT: Measuring Object Compositionality in Emergent Representations. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:24388-24413 Available from https://proceedings.mlr.press/v162/xie22b.html.

Related Material