Provably Eﬃcient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2764-2772, 2021.
Risk-sensitivity plays a central role in artiﬁcial intelligence safety. In this paper, we study the global convergence of the actor-critic algorithm for risk-sensitive reinforcement learning (RSRL) with exponential utility, which remains challenging for policy optimization as it lacks the linearity needed to formulate policy gradient. To bypass such an issue of nonlinearity, we resort to the equivalence between RSRL and robust adversarial reinforcement learning (RARL), which is formulated as a zero-sum Markov game with a hypothetical adversary. In particular, the Nash equilibrium (NE) of such a game yields the optimal policy for RSRL, which is provably robust. We focus on a simple yet fundamental setting known as linear-quadratic (LQ) game. To attain the optimal policy, we develop a nested natural actor-critic algorithm, which provably converges to the NE of the LQ game at a sublinear rate, thus solving both RSRL and RARL. To the best knowledge, the proposed nested actor-critic algorithm appears to be the ﬁrst model-free policy optimization algorithm that provably attains the optimal policy for RSRL and RARL in the LQ setting, which sheds light on more general settings.