Meta-Model-Based Meta-Policy Optimization

Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:129-144, 2021.

Abstract

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-hiraoka21a, title = {Meta-Model-Based Meta-Policy Optimization}, author = {Hiraoka, Takuya and Imagawa, Takahisa and Tangkaratt, Voot and Osa, Takayuki and Onishi, Takashi and Tsuruoka, Yoshimasa}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {129--144}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/hiraoka21a/hiraoka21a.pdf}, url = {https://proceedings.mlr.press/v157/hiraoka21a.html}, abstract = {Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.} }
Endnote
%0 Conference Paper %T Meta-Model-Based Meta-Policy Optimization %A Takuya Hiraoka %A Takahisa Imagawa %A Voot Tangkaratt %A Takayuki Osa %A Takashi Onishi %A Yoshimasa Tsuruoka %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-hiraoka21a %I PMLR %P 129--144 %U https://proceedings.mlr.press/v157/hiraoka21a.html %V 157 %X Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.
APA
Hiraoka, T., Imagawa, T., Tangkaratt, V., Osa, T., Onishi, T. & Tsuruoka, Y.. (2021). Meta-Model-Based Meta-Policy Optimization. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:129-144 Available from https://proceedings.mlr.press/v157/hiraoka21a.html.

Related Material