Goal Misgeneralization in Deep Reinforcement Learning

Lauro Langosco Di Langosco, Jack Koch, Lee D Sharkey, Jacob Pfau, David Krueger
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12004-12019, 2022.

Abstract

We study goal misgeneralization, a type of out-of-distribution robustness failure in reinforcement learning (RL). Goal misgeneralization occurs when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time.We provide the first explicit empirical demonstrations of goal misgeneralization and present a partial characterization of its causes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-langosco22a, title = {Goal Misgeneralization in Deep Reinforcement Learning}, author = {Langosco, Lauro Langosco Di and Koch, Jack and Sharkey, Lee D and Pfau, Jacob and Krueger, David}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {12004--12019}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/langosco22a/langosco22a.pdf}, url = {https://proceedings.mlr.press/v162/langosco22a.html}, abstract = {We study goal misgeneralization, a type of out-of-distribution robustness failure in reinforcement learning (RL). Goal misgeneralization occurs when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time.We provide the first explicit empirical demonstrations of goal misgeneralization and present a partial characterization of its causes.} }
Endnote
%0 Conference Paper %T Goal Misgeneralization in Deep Reinforcement Learning %A Lauro Langosco Di Langosco %A Jack Koch %A Lee D Sharkey %A Jacob Pfau %A David Krueger %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-langosco22a %I PMLR %P 12004--12019 %U https://proceedings.mlr.press/v162/langosco22a.html %V 162 %X We study goal misgeneralization, a type of out-of-distribution robustness failure in reinforcement learning (RL). Goal misgeneralization occurs when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time.We provide the first explicit empirical demonstrations of goal misgeneralization and present a partial characterization of its causes.
APA
Langosco, L.L.D., Koch, J., Sharkey, L.D., Pfau, J. & Krueger, D.. (2022). Goal Misgeneralization in Deep Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:12004-12019 Available from https://proceedings.mlr.press/v162/langosco22a.html.

Related Material