Compositional Interfaces for Compositional Generalization

Jelena Luketina, Jack Lanchantin, Sainbayar Sukhbaatar, Arthur Szlam
Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274:692-709, 2025.

Abstract

With recent work such as GATO (Reed et al., 2022) we see the development of agents that can accomplish a variety of tasks, while perceiving the world and acting in multiple observation and action spaces. We would want such agents to exhibit compositional generalization to unseen combinations of observation and action spaces, and adapt quickly to novel observation spaces by transferring knowledge. Yet, the specific setting requiring generalization to unseen compositions of observational modalities, action spaces, and instructions has not been systematically studied before. In this work, we demonstrate how such generalization can be achieved through the use of end-to-end modular architectures: the encoding of observations and the prediction of actions are handled by differentiable modules specialized to that space, with a single shared controller between them. To study the properties of such modular architectures in a controlled manner, we construct an environment with a compositional structure, where each instance of the environment is created by combining an observation, action, and instruction space from a large set of options. We demonstrate that through the use of modularity, agents can generalize to unseen combinations of observation, action and instruction spaces; even when the unseen combinations are more challenging. Moreover, we demonstrate that modularity enables quick integration of novel observation modalities, requiring only adaptation of the modules encoding the new observation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v274-luketina25a, title = {Compositional Interfaces for Compositional Generalization}, author = {Luketina, Jelena and Lanchantin, Jack and Sukhbaatar, Sainbayar and Szlam, Arthur}, booktitle = {Proceedings of The 3rd Conference on Lifelong Learning Agents}, pages = {692--709}, year = {2025}, editor = {Lomonaco, Vincenzo and Melacci, Stefano and Tuytelaars, Tinne and Chandar, Sarath and Pascanu, Razvan}, volume = {274}, series = {Proceedings of Machine Learning Research}, month = {29 Jul--01 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v274/main/assets/luketina25a/luketina25a.pdf}, url = {https://proceedings.mlr.press/v274/luketina25a.html}, abstract = {With recent work such as GATO (Reed et al., 2022) we see the development of agents that can accomplish a variety of tasks, while perceiving the world and acting in multiple observation and action spaces. We would want such agents to exhibit compositional generalization to unseen combinations of observation and action spaces, and adapt quickly to novel observation spaces by transferring knowledge. Yet, the specific setting requiring generalization to unseen compositions of observational modalities, action spaces, and instructions has not been systematically studied before. In this work, we demonstrate how such generalization can be achieved through the use of end-to-end modular architectures: the encoding of observations and the prediction of actions are handled by differentiable modules specialized to that space, with a single shared controller between them. To study the properties of such modular architectures in a controlled manner, we construct an environment with a compositional structure, where each instance of the environment is created by combining an observation, action, and instruction space from a large set of options. We demonstrate that through the use of modularity, agents can generalize to unseen combinations of observation, action and instruction spaces; even when the unseen combinations are more challenging. Moreover, we demonstrate that modularity enables quick integration of novel observation modalities, requiring only adaptation of the modules encoding the new observation.} }
Endnote
%0 Conference Paper %T Compositional Interfaces for Compositional Generalization %A Jelena Luketina %A Jack Lanchantin %A Sainbayar Sukhbaatar %A Arthur Szlam %B Proceedings of The 3rd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2025 %E Vincenzo Lomonaco %E Stefano Melacci %E Tinne Tuytelaars %E Sarath Chandar %E Razvan Pascanu %F pmlr-v274-luketina25a %I PMLR %P 692--709 %U https://proceedings.mlr.press/v274/luketina25a.html %V 274 %X With recent work such as GATO (Reed et al., 2022) we see the development of agents that can accomplish a variety of tasks, while perceiving the world and acting in multiple observation and action spaces. We would want such agents to exhibit compositional generalization to unseen combinations of observation and action spaces, and adapt quickly to novel observation spaces by transferring knowledge. Yet, the specific setting requiring generalization to unseen compositions of observational modalities, action spaces, and instructions has not been systematically studied before. In this work, we demonstrate how such generalization can be achieved through the use of end-to-end modular architectures: the encoding of observations and the prediction of actions are handled by differentiable modules specialized to that space, with a single shared controller between them. To study the properties of such modular architectures in a controlled manner, we construct an environment with a compositional structure, where each instance of the environment is created by combining an observation, action, and instruction space from a large set of options. We demonstrate that through the use of modularity, agents can generalize to unseen combinations of observation, action and instruction spaces; even when the unseen combinations are more challenging. Moreover, we demonstrate that modularity enables quick integration of novel observation modalities, requiring only adaptation of the modules encoding the new observation.
APA
Luketina, J., Lanchantin, J., Sukhbaatar, S. & Szlam, A.. (2025). Compositional Interfaces for Compositional Generalization. Proceedings of The 3rd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 274:692-709 Available from https://proceedings.mlr.press/v274/luketina25a.html.

Related Material