Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning

Evan Zheran Liu, Sahaana Suri, Tong Mu, Allan Zhou, Chelsea Finn
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:21997-22008, 2023.

Abstract

Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate language with its meaning requires a dynamic environment with varied language. Therefore, we investigate this question in a multi-task environment with language that varies across the different tasks. Specifically, we design an office navigation environment, where the agent’s goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). Each building includes a floor plan with a simple language description of the goal office’s location, which can be visually read as an RGB image when visited. We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases, and quickly navigate to the correct office, despite receiving no direct language supervision.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-liu23af, title = {Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning}, author = {Liu, Evan Zheran and Suri, Sahaana and Mu, Tong and Zhou, Allan and Finn, Chelsea}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {21997--22008}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/liu23af/liu23af.pdf}, url = {https://proceedings.mlr.press/v202/liu23af.html}, abstract = {Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate language with its meaning requires a dynamic environment with varied language. Therefore, we investigate this question in a multi-task environment with language that varies across the different tasks. Specifically, we design an office navigation environment, where the agent’s goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). Each building includes a floor plan with a simple language description of the goal office’s location, which can be visually read as an RGB image when visited. We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases, and quickly navigate to the correct office, despite receiving no direct language supervision.} }
Endnote
%0 Conference Paper %T Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning %A Evan Zheran Liu %A Sahaana Suri %A Tong Mu %A Allan Zhou %A Chelsea Finn %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-liu23af %I PMLR %P 21997--22008 %U https://proceedings.mlr.press/v202/liu23af.html %V 202 %X Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate language with its meaning requires a dynamic environment with varied language. Therefore, we investigate this question in a multi-task environment with language that varies across the different tasks. Specifically, we design an office navigation environment, where the agent’s goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). Each building includes a floor plan with a simple language description of the goal office’s location, which can be visually read as an RGB image when visited. We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases, and quickly navigate to the correct office, despite receiving no direct language supervision.
APA
Liu, E.Z., Suri, S., Mu, T., Zhou, A. & Finn, C.. (2023). Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:21997-22008 Available from https://proceedings.mlr.press/v202/liu23af.html.

Related Material