Contrastive Example-Based Control

Kyle Beltran Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov, Sergey Levine, Chelsea Finn
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:155-169, 2023.

Abstract

While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.

Cite this Paper


BibTeX
@InProceedings{pmlr-v211-hatch23a, title = {Contrastive Example-Based Control}, author = {Hatch, Kyle Beltran and Eysenbach, Benjamin and Rafailov, Rafael and Yu, Tianhe and Salakhutdinov, Ruslan and Levine, Sergey and Finn, Chelsea}, booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference}, pages = {155--169}, year = {2023}, editor = {Matni, Nikolai and Morari, Manfred and Pappas, George J.}, volume = {211}, series = {Proceedings of Machine Learning Research}, month = {15--16 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v211/hatch23a/hatch23a.pdf}, url = {https://proceedings.mlr.press/v211/hatch23a.html}, abstract = {While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.} }
Endnote
%0 Conference Paper %T Contrastive Example-Based Control %A Kyle Beltran Hatch %A Benjamin Eysenbach %A Rafael Rafailov %A Tianhe Yu %A Ruslan Salakhutdinov %A Sergey Levine %A Chelsea Finn %B Proceedings of The 5th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2023 %E Nikolai Matni %E Manfred Morari %E George J. Pappas %F pmlr-v211-hatch23a %I PMLR %P 155--169 %U https://proceedings.mlr.press/v211/hatch23a.html %V 211 %X While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.
APA
Hatch, K.B., Eysenbach, B., Rafailov, R., Yu, T., Salakhutdinov, R., Levine, S. & Finn, C.. (2023). Contrastive Example-Based Control. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:155-169 Available from https://proceedings.mlr.press/v211/hatch23a.html.

Related Material