Hessian Aided Policy Gradient

Zebang Shen, Alejandro Ribeiro, Hamed Hassani, Hui Qian, Chao Mi
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5729-5738, 2019.

Abstract

Reducing the variance of estimators for policy gradient has long been the focus of reinforcement learning research. While classic algorithms like REINFORCE find an $\epsilon$-approximate first-order stationary point in $\OM({1}/{\epsilon^4})$ random trajectory simulations, no provable improvement on the complexity has been made so far. This paper presents a Hessian aided policy gradient method with the first improved sample complexity of $\OM({1}/{\epsilon^3})$. While our method exploits information from the policy Hessian, it can be implemented in linear time with respect to the parameter dimension and is hence applicable to sophisticated DNN parameterization. Simulations on standard tasks validate the efficiency of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-shen19d, title = {Hessian Aided Policy Gradient}, author = {Shen, Zebang and Ribeiro, Alejandro and Hassani, Hamed and Qian, Hui and Mi, Chao}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5729--5738}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/shen19d/shen19d.pdf}, url = {https://proceedings.mlr.press/v97/shen19d.html}, abstract = {Reducing the variance of estimators for policy gradient has long been the focus of reinforcement learning research. While classic algorithms like REINFORCE find an $\epsilon$-approximate first-order stationary point in $\OM({1}/{\epsilon^4})$ random trajectory simulations, no provable improvement on the complexity has been made so far. This paper presents a Hessian aided policy gradient method with the first improved sample complexity of $\OM({1}/{\epsilon^3})$. While our method exploits information from the policy Hessian, it can be implemented in linear time with respect to the parameter dimension and is hence applicable to sophisticated DNN parameterization. Simulations on standard tasks validate the efficiency of our method.} }
Endnote
%0 Conference Paper %T Hessian Aided Policy Gradient %A Zebang Shen %A Alejandro Ribeiro %A Hamed Hassani %A Hui Qian %A Chao Mi %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-shen19d %I PMLR %P 5729--5738 %U https://proceedings.mlr.press/v97/shen19d.html %V 97 %X Reducing the variance of estimators for policy gradient has long been the focus of reinforcement learning research. While classic algorithms like REINFORCE find an $\epsilon$-approximate first-order stationary point in $\OM({1}/{\epsilon^4})$ random trajectory simulations, no provable improvement on the complexity has been made so far. This paper presents a Hessian aided policy gradient method with the first improved sample complexity of $\OM({1}/{\epsilon^3})$. While our method exploits information from the policy Hessian, it can be implemented in linear time with respect to the parameter dimension and is hence applicable to sophisticated DNN parameterization. Simulations on standard tasks validate the efficiency of our method.
APA
Shen, Z., Ribeiro, A., Hassani, H., Qian, H. & Mi, C.. (2019). Hessian Aided Policy Gradient. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5729-5738 Available from https://proceedings.mlr.press/v97/shen19d.html.

Related Material