Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Hang Wang, Sen Lin, Junshan Zhang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:35989-36019, 2023.

Abstract

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved quickly in some cases but become stagnant in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ”whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?”. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton’s method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-wang23q, title = {Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap}, author = {Wang, Hang and Lin, Sen and Zhang, Junshan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {35989--36019}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/wang23q/wang23q.pdf}, url = {https://proceedings.mlr.press/v202/wang23q.html}, abstract = {Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved quickly in some cases but become stagnant in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ”whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?”. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton’s method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.} }
Endnote
%0 Conference Paper %T Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap %A Hang Wang %A Sen Lin %A Junshan Zhang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-wang23q %I PMLR %P 35989--36019 %U https://proceedings.mlr.press/v202/wang23q.html %V 202 %X Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved quickly in some cases but become stagnant in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ”whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?”. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton’s method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.
APA
Wang, H., Lin, S. & Zhang, J.. (2023). Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:35989-36019 Available from https://proceedings.mlr.press/v202/wang23q.html.

Related Material