An Analytical Update Rule for General Policy Optimization

Hepeng Li; Nicholas Clavette; Haibo He

An Analytical Update Rule for General Policy Optimization

Hepeng Li, Nicholas Clavette, Haibo He

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12696-12716, 2022.

Abstract

We present an analytical policy update rule that is independent of parametric function approximators. The policy update rule is suitable for optimizing general stochastic policies and has a monotonic improvement guarantee. It is derived from a closed-form solution to trust-region optimization using calculus of variation, following a new theoretical result that tightens existing bounds for policy improvement using trust-region methods. The update rule builds a connection between policy search methods and value function methods. Moreover, off-policy reinforcement learning algorithms can be derived from the update rule since it does not need to compute integration over on-policy states. In addition, the update rule extends immediately to cooperative multi-agent systems when policy updates are performed by one agent at a time.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-li22d,
  title = 	 {An Analytical Update Rule for General Policy Optimization},
  author =       {Li, Hepeng and Clavette, Nicholas and He, Haibo},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {12696--12716},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/li22d/li22d.pdf},
  url = 	 {https://proceedings.mlr.press/v162/li22d.html},
  abstract = 	 {We present an analytical policy update rule that is independent of parametric function approximators. The policy update rule is suitable for optimizing general stochastic policies and has a monotonic improvement guarantee. It is derived from a closed-form solution to trust-region optimization using calculus of variation, following a new theoretical result that tightens existing bounds for policy improvement using trust-region methods. The update rule builds a connection between policy search methods and value function methods. Moreover, off-policy reinforcement learning algorithms can be derived from the update rule since it does not need to compute integration over on-policy states. In addition, the update rule extends immediately to cooperative multi-agent systems when policy updates are performed by one agent at a time.}
}

Endnote

%0 Conference Paper
%T An Analytical Update Rule for General Policy Optimization
%A Hepeng Li
%A Nicholas Clavette
%A Haibo He
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-li22d
%I PMLR
%P 12696--12716
%U https://proceedings.mlr.press/v162/li22d.html
%V 162
%X We present an analytical policy update rule that is independent of parametric function approximators. The policy update rule is suitable for optimizing general stochastic policies and has a monotonic improvement guarantee. It is derived from a closed-form solution to trust-region optimization using calculus of variation, following a new theoretical result that tightens existing bounds for policy improvement using trust-region methods. The update rule builds a connection between policy search methods and value function methods. Moreover, off-policy reinforcement learning algorithms can be derived from the update rule since it does not need to compute integration over on-policy states. In addition, the update rule extends immediately to cooperative multi-agent systems when policy updates are performed by one agent at a time.

APA


Li, H., Clavette, N. & He, H.. (2022). An Analytical Update Rule for General Policy Optimization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:12696-12716 Available from https://proceedings.mlr.press/v162/li22d.html.

Related Material

Download PDF