Representation Matters: Offline Pretraining for Sequential Decision Making

Mengjiao Yang, Ofir Nachum
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11784-11794, 2021.

Abstract

The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms. This research area, known as offline RL, has largely focused on offline policy optimization, aiming to find a return-maximizing policy exclusively from offline data. In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We aim to answer the question, what unsupervised objectives applied to offline datasets are able to learn state representations which elevate performance on downstream tasks, whether those downstream tasks be online RL, imitation learning from expert demonstrations, or even offline policy optimization based on the same offline dataset? Through a variety of experiments utilizing standard offline RL datasets, we find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms that otherwise yield mediocre performance on their own. Extensive ablations further provide insights into what components of these unsupervised objectives {–} e.g., reward prediction, continuous or discrete representations, pretraining or finetuning {–} are most important and in which settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-yang21h, title = {Representation Matters: Offline Pretraining for Sequential Decision Making}, author = {Yang, Mengjiao and Nachum, Ofir}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {11784--11794}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/yang21h/yang21h.pdf}, url = {https://proceedings.mlr.press/v139/yang21h.html}, abstract = {The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms. This research area, known as offline RL, has largely focused on offline policy optimization, aiming to find a return-maximizing policy exclusively from offline data. In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We aim to answer the question, what unsupervised objectives applied to offline datasets are able to learn state representations which elevate performance on downstream tasks, whether those downstream tasks be online RL, imitation learning from expert demonstrations, or even offline policy optimization based on the same offline dataset? Through a variety of experiments utilizing standard offline RL datasets, we find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms that otherwise yield mediocre performance on their own. Extensive ablations further provide insights into what components of these unsupervised objectives {–} e.g., reward prediction, continuous or discrete representations, pretraining or finetuning {–} are most important and in which settings.} }
Endnote
%0 Conference Paper %T Representation Matters: Offline Pretraining for Sequential Decision Making %A Mengjiao Yang %A Ofir Nachum %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-yang21h %I PMLR %P 11784--11794 %U https://proceedings.mlr.press/v139/yang21h.html %V 139 %X The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms. This research area, known as offline RL, has largely focused on offline policy optimization, aiming to find a return-maximizing policy exclusively from offline data. In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We aim to answer the question, what unsupervised objectives applied to offline datasets are able to learn state representations which elevate performance on downstream tasks, whether those downstream tasks be online RL, imitation learning from expert demonstrations, or even offline policy optimization based on the same offline dataset? Through a variety of experiments utilizing standard offline RL datasets, we find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms that otherwise yield mediocre performance on their own. Extensive ablations further provide insights into what components of these unsupervised objectives {–} e.g., reward prediction, continuous or discrete representations, pretraining or finetuning {–} are most important and in which settings.
APA
Yang, M. & Nachum, O.. (2021). Representation Matters: Offline Pretraining for Sequential Decision Making. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11784-11794 Available from https://proceedings.mlr.press/v139/yang21h.html.

Related Material