Cautious Actor-Critic

Lingwei Zhu; Toshinori Kitamura; Matsubara Takamitsu

Cautious Actor-Critic

Lingwei Zhu, Toshinori Kitamura, Matsubara Takamitsu

Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:220-235, 2021.

Abstract

The oscillating performance of off-policy learning and persisting errors in the actor-critic(AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate thatCAC achieves comparable performance while significantly stabilizes learning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v157-zhu21a,
  title = 	 {Cautious Actor-Critic},
  author =       {Zhu, Lingwei and Kitamura, Toshinori and Takamitsu, Matsubara},
  booktitle = 	 {Proceedings of The 13th Asian Conference on Machine Learning},
  pages = 	 {220--235},
  year = 	 {2021},
  editor = 	 {Balasubramanian, Vineeth N. and Tsang, Ivor},
  volume = 	 {157},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v157/zhu21a/zhu21a.pdf},
  url = 	 {https://proceedings.mlr.press/v157/zhu21a.html},
  abstract = 	 {The oscillating performance of off-policy learning and persisting errors in the actor-critic(AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications  better.   In  this  paper,  we  propose  a  novel  off-policy  AC  algorithm  cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic.  Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement.  We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate thatCAC achieves comparable performance while significantly stabilizes learning.}
}

Endnote

%0 Conference Paper
%T Cautious Actor-Critic
%A Lingwei Zhu
%A Toshinori Kitamura
%A Matsubara Takamitsu
%B Proceedings of The 13th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Vineeth N. Balasubramanian
%E Ivor Tsang	
%F pmlr-v157-zhu21a
%I PMLR
%P 220--235
%U https://proceedings.mlr.press/v157/zhu21a.html
%V 157
%X The oscillating performance of off-policy learning and persisting errors in the actor-critic(AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications  better.   In  this  paper,  we  propose  a  novel  off-policy  AC  algorithm  cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic.  Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement.  We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate thatCAC achieves comparable performance while significantly stabilizes learning.

APA


Zhu, L., Kitamura, T. & Takamitsu, M.. (2021). Cautious Actor-Critic. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:220-235 Available from https://proceedings.mlr.press/v157/zhu21a.html.

Cautious Actor-Critic

Abstract

Cite this Paper

Related Material