On the Convergence of SARSA with Linear Function Approximation

Shangtong Zhang; Remi Tachet Des Combes; Romain Laroche

On the Convergence of SARSA with Linear Function Approximation

Shangtong Zhang, Remi Tachet Des Combes, Romain Laroche

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:41613-41646, 2023.

Abstract

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA’s policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-zhang23al,
  title = 	 {On the Convergence of {SARSA} with Linear Function Approximation},
  author =       {Zhang, Shangtong and Tachet Des Combes, Remi and Laroche, Romain},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {41613--41646},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/zhang23al/zhang23al.pdf},
  url = 	 {https://proceedings.mlr.press/v202/zhang23al.html},
  abstract = 	 {SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA’s policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime.}
}

Endnote

%0 Conference Paper
%T On the Convergence of SARSA with Linear Function Approximation
%A Shangtong Zhang
%A Remi Tachet Des Combes
%A Romain Laroche
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-zhang23al
%I PMLR
%P 41613--41646
%U https://proceedings.mlr.press/v202/zhang23al.html
%V 202
%X SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA’s policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime.

APA

Zhang, S., Tachet Des Combes, R. & Laroche, R.. (2023). On the Convergence of SARSA with Linear Function Approximation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:41613-41646 Available from https://proceedings.mlr.press/v202/zhang23al.html.

On the Convergence of SARSA with Linear Function Approximation

Abstract

Cite this Paper

Related Material