Quasi-Newton Trust Region Policy Optimization

Devesh K. Jha, Arvind U. Raghunathan, Diego Romeres
Proceedings of the Conference on Robot Learning, PMLR 100:945-954, 2020.

Abstract

We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-jha20a, title = {Quasi-Newton Trust Region Policy Optimization}, author = {Jha, Devesh K. and Raghunathan, Arvind U. and Romeres, Diego}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {945--954}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/jha20a/jha20a.pdf}, url = {https://proceedings.mlr.press/v100/jha20a.html}, abstract = {We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance.} }
Endnote
%0 Conference Paper %T Quasi-Newton Trust Region Policy Optimization %A Devesh K. Jha %A Arvind U. Raghunathan %A Diego Romeres %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-jha20a %I PMLR %P 945--954 %U https://proceedings.mlr.press/v100/jha20a.html %V 100 %X We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance.
APA
Jha, D.K., Raghunathan, A.U. & Romeres, D.. (2020). Quasi-Newton Trust Region Policy Optimization. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:945-954 Available from https://proceedings.mlr.press/v100/jha20a.html.

Related Material