Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning

Austin W. Hanjie, Victor Y Zhong, Karthik Narasimhan
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:4051-4062, 2021.

Abstract

We investigate the use of natural language to drive the generalization of control policies and introduce the new multi-task environment Messenger with free-form text manuals describing the environment dynamics. Unlike previous work, Messenger does not assume prior knowledge connecting text and state observations {—} the control policy must simultaneously ground the game manual to entity symbols and dynamics in the environment. We develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses an entity-conditioned attention module that allows for selective focus over relevant descriptions in the manual for each entity in the environment. EMMA is end-to-end differentiable and learns a latent grounding of entities and dynamics from text to observations using only environment rewards. EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining a 40% higher win rate compared to multiple baselines. However, win rate on the hardest stage of Messenger remains low (10%), demonstrating the need for additional work in this direction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-hanjie21a, title = {Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning}, author = {Hanjie, Austin W. and Zhong, Victor Y and Narasimhan, Karthik}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {4051--4062}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/hanjie21a/hanjie21a.pdf}, url = {https://proceedings.mlr.press/v139/hanjie21a.html}, abstract = {We investigate the use of natural language to drive the generalization of control policies and introduce the new multi-task environment Messenger with free-form text manuals describing the environment dynamics. Unlike previous work, Messenger does not assume prior knowledge connecting text and state observations {—} the control policy must simultaneously ground the game manual to entity symbols and dynamics in the environment. We develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses an entity-conditioned attention module that allows for selective focus over relevant descriptions in the manual for each entity in the environment. EMMA is end-to-end differentiable and learns a latent grounding of entities and dynamics from text to observations using only environment rewards. EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining a 40% higher win rate compared to multiple baselines. However, win rate on the hardest stage of Messenger remains low (10%), demonstrating the need for additional work in this direction.} }
Endnote
%0 Conference Paper %T Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning %A Austin W. Hanjie %A Victor Y Zhong %A Karthik Narasimhan %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-hanjie21a %I PMLR %P 4051--4062 %U https://proceedings.mlr.press/v139/hanjie21a.html %V 139 %X We investigate the use of natural language to drive the generalization of control policies and introduce the new multi-task environment Messenger with free-form text manuals describing the environment dynamics. Unlike previous work, Messenger does not assume prior knowledge connecting text and state observations {—} the control policy must simultaneously ground the game manual to entity symbols and dynamics in the environment. We develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses an entity-conditioned attention module that allows for selective focus over relevant descriptions in the manual for each entity in the environment. EMMA is end-to-end differentiable and learns a latent grounding of entities and dynamics from text to observations using only environment rewards. EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining a 40% higher win rate compared to multiple baselines. However, win rate on the hardest stage of Messenger remains low (10%), demonstrating the need for additional work in this direction.
APA
Hanjie, A.W., Zhong, V.Y. & Narasimhan, K.. (2021). Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:4051-4062 Available from https://proceedings.mlr.press/v139/hanjie21a.html.

Related Material