On task-relevant loss functions in meta-reinforcement learning

Jaeuk Shin, Giho Kim, Howon Lee, Joonho Han, Insoon Yang
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:1174-1186, 2024.

Abstract

Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment. The key component of our method is the loss function for learning both the task inference module and the system model. This systematically couples the model discrepancy and the value estimate, thereby enabling our proposed algorithm to learn the policy and task inference module with a significantly smaller amount of data compared to the existing meta-RL algorithms. The proposed method is evaluated in high-dimensional robotic control, empirically verifying its effectiveness in extracting information indispensable for solving the tasks from observations in a sample-efficient manner.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-shin24a, title = {On task-relevant loss functions in meta-reinforcement learning}, author = {Shin, Jaeuk and Kim, Giho and Lee, Howon and Han, Joonho and Yang, Insoon}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {1174--1186}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/shin24a/shin24a.pdf}, url = {https://proceedings.mlr.press/v242/shin24a.html}, abstract = {Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment. The key component of our method is the loss function for learning both the task inference module and the system model. This systematically couples the model discrepancy and the value estimate, thereby enabling our proposed algorithm to learn the policy and task inference module with a significantly smaller amount of data compared to the existing meta-RL algorithms. The proposed method is evaluated in high-dimensional robotic control, empirically verifying its effectiveness in extracting information indispensable for solving the tasks from observations in a sample-efficient manner.} }
Endnote
%0 Conference Paper %T On task-relevant loss functions in meta-reinforcement learning %A Jaeuk Shin %A Giho Kim %A Howon Lee %A Joonho Han %A Insoon Yang %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-shin24a %I PMLR %P 1174--1186 %U https://proceedings.mlr.press/v242/shin24a.html %V 242 %X Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment. The key component of our method is the loss function for learning both the task inference module and the system model. This systematically couples the model discrepancy and the value estimate, thereby enabling our proposed algorithm to learn the policy and task inference module with a significantly smaller amount of data compared to the existing meta-RL algorithms. The proposed method is evaluated in high-dimensional robotic control, empirically verifying its effectiveness in extracting information indispensable for solving the tasks from observations in a sample-efficient manner.
APA
Shin, J., Kim, G., Lee, H., Han, J. & Yang, I.. (2024). On task-relevant loss functions in meta-reinforcement learning. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:1174-1186 Available from https://proceedings.mlr.press/v242/shin24a.html.

Related Material