Quasi-Newton Trust Region Policy Optimization

Devesh K. Jha; Arvind U. Raghunathan; Diego Romeres

Quasi-Newton Trust Region Policy Optimization

Devesh K. Jha, Arvind U. Raghunathan, Diego Romeres

Proceedings of the Conference on Robot Learning, PMLR 100:945-954, 2020.

Abstract

We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance.

Cite this Paper

BibTeX

@InProceedings{pmlr-v100-jha20a,
  title = 	 {Quasi-Newton Trust Region Policy Optimization},
  author =       {Jha, Devesh K. and Raghunathan, Arvind U. and Romeres, Diego},
  booktitle = 	 {Proceedings of the Conference on Robot Learning},
  pages = 	 {945--954},
  year = 	 {2020},
  editor = 	 {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei},
  volume = 	 {100},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Oct--01 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v100/jha20a/jha20a.pdf},
  url = 	 {https://proceedings.mlr.press/v100/jha20a.html},
  abstract = 	 {We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance.}
}

Endnote

%0 Conference Paper
%T Quasi-Newton Trust Region Policy Optimization
%A Devesh K. Jha
%A Arvind U. Raghunathan
%A Diego Romeres
%B Proceedings of the Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Leslie Pack Kaelbling
%E Danica Kragic
%E Komei Sugiura	
%F pmlr-v100-jha20a
%I PMLR
%P 945--954
%U https://proceedings.mlr.press/v100/jha20a.html
%V 100
%X We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance.

APA

Jha, D.K., Raghunathan, A.U. & Romeres, D.. (2020). Quasi-Newton Trust Region Policy Optimization. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:945-954 Available from https://proceedings.mlr.press/v100/jha20a.html.

Quasi-Newton Trust Region Policy Optimization

Abstract

Cite this Paper

Related Material