Momentum-Based Policy Gradient Methods

Feihu Huang; Shangqian Gao; Jian Pei; Heng Huang

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4422-4433, 2020.

Abstract

In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of

$O(\epsilon^{-3})$ for finding an

$\epsilon$ -stationary point of the nonconcave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of

$O(\epsilon^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-huang20a,
  title = 	 {Momentum-Based Policy Gradient Methods},
  author =       {Huang, Feihu and Gao, Shangqian and Pei, Jian and Huang, Heng},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {4422--4433},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/huang20a/huang20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/huang20a.html},
  abstract = 	 {In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of the nonconcave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(\epsilon^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.}
}

Endnote

%0 Conference Paper
%T Momentum-Based Policy Gradient Methods
%A Feihu Huang
%A Shangqian Gao
%A Jian Pei
%A Heng Huang
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-huang20a
%I PMLR
%P 4422--4433
%U https://proceedings.mlr.press/v119/huang20a.html
%V 119
%X In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of the nonconcave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(\epsilon^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.

APA


Huang, F., Gao, S., Pei, J. & Huang, H.. (2020). Momentum-Based Policy Gradient Methods. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4422-4433 Available from https://proceedings.mlr.press/v119/huang20a.html.

Momentum-Based Policy Gradient Methods

Abstract

Cite this Paper

Related Material