Control Regularization for Reduced Variance Reinforcement Learning

Richard Cheng; Abhinav Verma; Gabor Orosz; Swarat Chaudhuri; Yisong Yue; Joel Burdick

Control Regularization for Reduced Variance Reinforcement Learning

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel Burdick

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1141-1150, 2019.

Abstract

Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-cheng19a,
  title = 	 {Control Regularization for Reduced Variance Reinforcement Learning},
  author =       {Cheng, Richard and Verma, Abhinav and Orosz, Gabor and Chaudhuri, Swarat and Yue, Yisong and Burdick, Joel},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {1141--1150},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/cheng19a/cheng19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/cheng19a.html},
  abstract = 	 {Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.}
}

Endnote

%0 Conference Paper
%T Control Regularization for Reduced Variance Reinforcement Learning
%A Richard Cheng
%A Abhinav Verma
%A Gabor Orosz
%A Swarat Chaudhuri
%A Yisong Yue
%A Joel Burdick
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-cheng19a
%I PMLR
%P 1141--1150
%U https://proceedings.mlr.press/v97/cheng19a.html
%V 97
%X Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.

APA


Cheng, R., Verma, A., Orosz, G., Chaudhuri, S., Yue, Y. & Burdick, J.. (2019). Control Regularization for Reduced Variance Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1141-1150 Available from https://proceedings.mlr.press/v97/cheng19a.html.

Control Regularization for Reduced Variance Reinforcement Learning

Abstract

Cite this Paper

Related Material