Meta-Model-Based Meta-Policy Optimization

Takuya Hiraoka; Takahisa Imagawa; Voot Tangkaratt; Takayuki Osa; Takashi Onishi; Yoshimasa Tsuruoka

Meta-Model-Based Meta-Policy Optimization

Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka

Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:129-144, 2021.

Abstract

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v157-hiraoka21a,
  title = 	 {Meta-Model-Based Meta-Policy Optimization},
  author =       {Hiraoka, Takuya and Imagawa, Takahisa and Tangkaratt, Voot and Osa, Takayuki and Onishi, Takashi and Tsuruoka, Yoshimasa},
  booktitle = 	 {Proceedings of The 13th Asian Conference on Machine Learning},
  pages = 	 {129--144},
  year = 	 {2021},
  editor = 	 {Balasubramanian, Vineeth N. and Tsang, Ivor},
  volume = 	 {157},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v157/hiraoka21a/hiraoka21a.pdf},
  url = 	 {https://proceedings.mlr.press/v157/hiraoka21a.html},
  abstract = 	 {Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.}
}

Endnote

%0 Conference Paper
%T Meta-Model-Based Meta-Policy Optimization
%A Takuya Hiraoka
%A Takahisa Imagawa
%A Voot Tangkaratt
%A Takayuki Osa
%A Takashi Onishi
%A Yoshimasa Tsuruoka
%B Proceedings of The 13th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Vineeth N. Balasubramanian
%E Ivor Tsang	
%F pmlr-v157-hiraoka21a
%I PMLR
%P 129--144
%U https://proceedings.mlr.press/v157/hiraoka21a.html
%V 157
%X Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

APA

Hiraoka, T., Imagawa, T., Tangkaratt, V., Osa, T., Onishi, T. & Tsuruoka, Y.. (2021). Meta-Model-Based Meta-Policy Optimization. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:129-144 Available from https://proceedings.mlr.press/v157/hiraoka21a.html.

Meta-Model-Based Meta-Policy Optimization

Abstract

Cite this Paper

Related Material