Residual learning and context encoding for adaptive offline-to-online reinforcement learning

Mohammadreza Nakhaei, Aidan Scannell, Joni Pajarinen
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:1107-1121, 2024.

Abstract

Offline reinforcement learning (RL) allows learning sequential behavior from fixed datasets. Since offline datasets do not cover all possible situations, many methods collect additional data during online fine-tuning to improve performance. In general, these methods assume that the transition dynamics remain the same during both the offline and online phases of training. However, in many real-world applications, such as outdoor construction and navigation over rough terrain, it is common for the transition dynamics to vary between the offline and online phases. Moreover, the dynamics may vary during the online training. To address this problem of changing dynamics from offline to online RL we propose a residual learning approach that infers dynamics changes to correct the outputs of the offline solution. At the online fine-tuning phase, we train a context encoder to learn a representation that is consistent inside the current online learning environment while being able to predict dynamic transitions. Experiments in D4RL MuJoCo environments, modified to support dynamics’ changes upon environment resets, show that our approach can adapt to these dynamic changes and generalize to unseen perturbations in a sample-efficient way, whilst comparison methods cannot.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-nakhaeinezhadfard24a, title = {Residual learning and context encoding for adaptive offline-to-online reinforcement learning}, author = {Nakhaei, Mohammadreza and Scannell, Aidan and Pajarinen, Joni}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {1107--1121}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/nakhaeinezhadfard24a/nakhaeinezhadfard24a.pdf}, url = {https://proceedings.mlr.press/v242/nakhaeinezhadfard24a.html}, abstract = {Offline reinforcement learning (RL) allows learning sequential behavior from fixed datasets. Since offline datasets do not cover all possible situations, many methods collect additional data during online fine-tuning to improve performance. In general, these methods assume that the transition dynamics remain the same during both the offline and online phases of training. However, in many real-world applications, such as outdoor construction and navigation over rough terrain, it is common for the transition dynamics to vary between the offline and online phases. Moreover, the dynamics may vary during the online training. To address this problem of changing dynamics from offline to online RL we propose a residual learning approach that infers dynamics changes to correct the outputs of the offline solution. At the online fine-tuning phase, we train a context encoder to learn a representation that is consistent inside the current online learning environment while being able to predict dynamic transitions. Experiments in D4RL MuJoCo environments, modified to support dynamics’ changes upon environment resets, show that our approach can adapt to these dynamic changes and generalize to unseen perturbations in a sample-efficient way, whilst comparison methods cannot.} }
Endnote
%0 Conference Paper %T Residual learning and context encoding for adaptive offline-to-online reinforcement learning %A Mohammadreza Nakhaei %A Aidan Scannell %A Joni Pajarinen %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-nakhaeinezhadfard24a %I PMLR %P 1107--1121 %U https://proceedings.mlr.press/v242/nakhaeinezhadfard24a.html %V 242 %X Offline reinforcement learning (RL) allows learning sequential behavior from fixed datasets. Since offline datasets do not cover all possible situations, many methods collect additional data during online fine-tuning to improve performance. In general, these methods assume that the transition dynamics remain the same during both the offline and online phases of training. However, in many real-world applications, such as outdoor construction and navigation over rough terrain, it is common for the transition dynamics to vary between the offline and online phases. Moreover, the dynamics may vary during the online training. To address this problem of changing dynamics from offline to online RL we propose a residual learning approach that infers dynamics changes to correct the outputs of the offline solution. At the online fine-tuning phase, we train a context encoder to learn a representation that is consistent inside the current online learning environment while being able to predict dynamic transitions. Experiments in D4RL MuJoCo environments, modified to support dynamics’ changes upon environment resets, show that our approach can adapt to these dynamic changes and generalize to unseen perturbations in a sample-efficient way, whilst comparison methods cannot.
APA
Nakhaei, M., Scannell, A. & Pajarinen, J.. (2024). Residual learning and context encoding for adaptive offline-to-online reinforcement learning. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:1107-1121 Available from https://proceedings.mlr.press/v242/nakhaeinezhadfard24a.html.

Related Material