Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari
Proceedings of The 6th Conference on Robot Learning, PMLR 205:1401-1410, 2023.

Abstract

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming. Moreover, manually designing reward functions for every single desired skill is prohibitive. Prior works targeted these challenges by learning goal-conditioned policies from offline datasets without manually specified rewards, through hindsight relabeling. These methods suffer from the issue of sparsity of rewards, and fail at long-horizon tasks. In this work, we propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model, and shape a dense reward function for learning policies offline. We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches, especially on tasks that involve long-term planning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-mezghani23a, title = {Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping}, author = {Mezghani, Lina and Sukhbaatar, Sainbayar and Bojanowski, Piotr and Lazaric, Alessandro and Alahari, Karteek}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {1401--1410}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/mezghani23a/mezghani23a.pdf}, url = {https://proceedings.mlr.press/v205/mezghani23a.html}, abstract = {Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming. Moreover, manually designing reward functions for every single desired skill is prohibitive. Prior works targeted these challenges by learning goal-conditioned policies from offline datasets without manually specified rewards, through hindsight relabeling. These methods suffer from the issue of sparsity of rewards, and fail at long-horizon tasks. In this work, we propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model, and shape a dense reward function for learning policies offline. We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches, especially on tasks that involve long-term planning.} }
Endnote
%0 Conference Paper %T Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping %A Lina Mezghani %A Sainbayar Sukhbaatar %A Piotr Bojanowski %A Alessandro Lazaric %A Karteek Alahari %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-mezghani23a %I PMLR %P 1401--1410 %U https://proceedings.mlr.press/v205/mezghani23a.html %V 205 %X Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming. Moreover, manually designing reward functions for every single desired skill is prohibitive. Prior works targeted these challenges by learning goal-conditioned policies from offline datasets without manually specified rewards, through hindsight relabeling. These methods suffer from the issue of sparsity of rewards, and fail at long-horizon tasks. In this work, we propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model, and shape a dense reward function for learning policies offline. We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches, especially on tasks that involve long-term planning.
APA
Mezghani, L., Sukhbaatar, S., Bojanowski, P., Lazaric, A. & Alahari, K.. (2023). Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:1401-1410 Available from https://proceedings.mlr.press/v205/mezghani23a.html.

Related Material