Control Regularization for Reduced Variance Reinforcement Learning

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel Burdick
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1141-1150, 2019.

Abstract

Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-cheng19a, title = {Control Regularization for Reduced Variance Reinforcement Learning}, author = {Cheng, Richard and Verma, Abhinav and Orosz, Gabor and Chaudhuri, Swarat and Yue, Yisong and Burdick, Joel}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1141--1150}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/cheng19a/cheng19a.pdf}, url = {https://proceedings.mlr.press/v97/cheng19a.html}, abstract = {Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.} }
Endnote
%0 Conference Paper %T Control Regularization for Reduced Variance Reinforcement Learning %A Richard Cheng %A Abhinav Verma %A Gabor Orosz %A Swarat Chaudhuri %A Yisong Yue %A Joel Burdick %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-cheng19a %I PMLR %P 1141--1150 %U https://proceedings.mlr.press/v97/cheng19a.html %V 97 %X Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.
APA
Cheng, R., Verma, A., Orosz, G., Chaudhuri, S., Yue, Y. & Burdick, J.. (2019). Control Regularization for Reduced Variance Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1141-1150 Available from https://proceedings.mlr.press/v97/cheng19a.html.

Related Material