Taming MAML: Efficient unbiased meta-reinforcement learning

Hao Liu, Richard Socher, Caiming Xiong
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4061-4071, 2019.

Abstract

While meta reinforcement learning (Meta-RL) methods have achieved remarkable success, obtaining correct and low variance estimates for policy gradients remains a significant challenge. In particular, estimating a large Hessian, poor sample efficiency and unstable training continue to make Meta-RL difficult. We propose a surrogate objective function named, Taming MAML (TMAML), that adds control variates into gradient estimation via automatic differentiation. TMAML improves the quality of gradient estimation by reducing variance without introducing bias. We further propose a version of our method that extends the meta-learning framework to learning the control variates themselves, enabling efficient and scalable learning from a distribution of MDPs. We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-liu19g, title = {Taming {MAML}: Efficient unbiased meta-reinforcement learning}, author = {Liu, Hao and Socher, Richard and Xiong, Caiming}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4061--4071}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/liu19g/liu19g.pdf}, url = {https://proceedings.mlr.press/v97/liu19g.html}, abstract = {While meta reinforcement learning (Meta-RL) methods have achieved remarkable success, obtaining correct and low variance estimates for policy gradients remains a significant challenge. In particular, estimating a large Hessian, poor sample efficiency and unstable training continue to make Meta-RL difficult. We propose a surrogate objective function named, Taming MAML (TMAML), that adds control variates into gradient estimation via automatic differentiation. TMAML improves the quality of gradient estimation by reducing variance without introducing bias. We further propose a version of our method that extends the meta-learning framework to learning the control variates themselves, enabling efficient and scalable learning from a distribution of MDPs. We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments.} }
Endnote
%0 Conference Paper %T Taming MAML: Efficient unbiased meta-reinforcement learning %A Hao Liu %A Richard Socher %A Caiming Xiong %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-liu19g %I PMLR %P 4061--4071 %U https://proceedings.mlr.press/v97/liu19g.html %V 97 %X While meta reinforcement learning (Meta-RL) methods have achieved remarkable success, obtaining correct and low variance estimates for policy gradients remains a significant challenge. In particular, estimating a large Hessian, poor sample efficiency and unstable training continue to make Meta-RL difficult. We propose a surrogate objective function named, Taming MAML (TMAML), that adds control variates into gradient estimation via automatic differentiation. TMAML improves the quality of gradient estimation by reducing variance without introducing bias. We further propose a version of our method that extends the meta-learning framework to learning the control variates themselves, enabling efficient and scalable learning from a distribution of MDPs. We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments.
APA
Liu, H., Socher, R. & Xiong, C.. (2019). Taming MAML: Efficient unbiased meta-reinforcement learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4061-4071 Available from https://proceedings.mlr.press/v97/liu19g.html.

Related Material