Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Gandharv Patil; Prashanth L.A.; Dheeraj Nagaraj; Doina Precup

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Gandharv Patil, Prashanth L.A., Dheeraj Nagaraj, Doina Precup

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5438-5448, 2023.

Abstract

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-patil23a,
  title = 	 {Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation},
  author =       {Patil, Gandharv and L.A., Prashanth and Nagaraj, Dheeraj and Precup, Doina},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {5438--5448},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/patil23a/patil23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/patil23a.html},
  abstract = 	 {We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.}
}

Endnote

%0 Conference Paper
%T Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation
%A Gandharv Patil
%A Prashanth L.A.
%A Dheeraj Nagaraj
%A Doina Precup
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-patil23a
%I PMLR
%P 5438--5448
%U https://proceedings.mlr.press/v206/patil23a.html
%V 206
%X We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.

APA


Patil, G., L.A., P., Nagaraj, D. & Precup, D.. (2023). Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:5438-5448 Available from https://proceedings.mlr.press/v206/patil23a.html.

Related Material

Download PDF