Provably Efficient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case

Yufeng Zhang, Zhuoran Yang, Zhaoran Wang
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2764-2772, 2021.

Abstract

Risk-sensitivity plays a central role in artificial intelligence safety. In this paper, we study the global convergence of the actor-critic algorithm for risk-sensitive reinforcement learning (RSRL) with exponential utility, which remains challenging for policy optimization as it lacks the linearity needed to formulate policy gradient. To bypass such an issue of nonlinearity, we resort to the equivalence between RSRL and robust adversarial reinforcement learning (RARL), which is formulated as a zero-sum Markov game with a hypothetical adversary. In particular, the Nash equilibrium (NE) of such a game yields the optimal policy for RSRL, which is provably robust. We focus on a simple yet fundamental setting known as linear-quadratic (LQ) game. To attain the optimal policy, we develop a nested natural actor-critic algorithm, which provably converges to the NE of the LQ game at a sublinear rate, thus solving both RSRL and RARL. To the best knowledge, the proposed nested actor-critic algorithm appears to be the first model-free policy optimization algorithm that provably attains the optimal policy for RSRL and RARL in the LQ setting, which sheds light on more general settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-zhang21f, title = { Provably Efficient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case }, author = {Zhang, Yufeng and Yang, Zhuoran and Wang, Zhaoran}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2764--2772}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/zhang21f/zhang21f.pdf}, url = {https://proceedings.mlr.press/v130/zhang21f.html}, abstract = { Risk-sensitivity plays a central role in artificial intelligence safety. In this paper, we study the global convergence of the actor-critic algorithm for risk-sensitive reinforcement learning (RSRL) with exponential utility, which remains challenging for policy optimization as it lacks the linearity needed to formulate policy gradient. To bypass such an issue of nonlinearity, we resort to the equivalence between RSRL and robust adversarial reinforcement learning (RARL), which is formulated as a zero-sum Markov game with a hypothetical adversary. In particular, the Nash equilibrium (NE) of such a game yields the optimal policy for RSRL, which is provably robust. We focus on a simple yet fundamental setting known as linear-quadratic (LQ) game. To attain the optimal policy, we develop a nested natural actor-critic algorithm, which provably converges to the NE of the LQ game at a sublinear rate, thus solving both RSRL and RARL. To the best knowledge, the proposed nested actor-critic algorithm appears to be the first model-free policy optimization algorithm that provably attains the optimal policy for RSRL and RARL in the LQ setting, which sheds light on more general settings. } }
Endnote
%0 Conference Paper %T Provably Efficient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case %A Yufeng Zhang %A Zhuoran Yang %A Zhaoran Wang %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-zhang21f %I PMLR %P 2764--2772 %U https://proceedings.mlr.press/v130/zhang21f.html %V 130 %X Risk-sensitivity plays a central role in artificial intelligence safety. In this paper, we study the global convergence of the actor-critic algorithm for risk-sensitive reinforcement learning (RSRL) with exponential utility, which remains challenging for policy optimization as it lacks the linearity needed to formulate policy gradient. To bypass such an issue of nonlinearity, we resort to the equivalence between RSRL and robust adversarial reinforcement learning (RARL), which is formulated as a zero-sum Markov game with a hypothetical adversary. In particular, the Nash equilibrium (NE) of such a game yields the optimal policy for RSRL, which is provably robust. We focus on a simple yet fundamental setting known as linear-quadratic (LQ) game. To attain the optimal policy, we develop a nested natural actor-critic algorithm, which provably converges to the NE of the LQ game at a sublinear rate, thus solving both RSRL and RARL. To the best knowledge, the proposed nested actor-critic algorithm appears to be the first model-free policy optimization algorithm that provably attains the optimal policy for RSRL and RARL in the LQ setting, which sheds light on more general settings.
APA
Zhang, Y., Yang, Z. & Wang, Z.. (2021). Provably Efficient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2764-2772 Available from https://proceedings.mlr.press/v130/zhang21f.html.

Related Material