Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Gandharv Patil, Prashanth L.A., Dheeraj Nagaraj, Doina Precup
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5438-5448, 2023.

Abstract

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-patil23a, title = {Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation}, author = {Patil, Gandharv and L.A., Prashanth and Nagaraj, Dheeraj and Precup, Doina}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {5438--5448}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/patil23a/patil23a.pdf}, url = {https://proceedings.mlr.press/v206/patil23a.html}, abstract = {We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.} }
Endnote
%0 Conference Paper %T Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation %A Gandharv Patil %A Prashanth L.A. %A Dheeraj Nagaraj %A Doina Precup %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-patil23a %I PMLR %P 5438--5448 %U https://proceedings.mlr.press/v206/patil23a.html %V 206 %X We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.
APA
Patil, G., L.A., P., Nagaraj, D. & Precup, D.. (2023). Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:5438-5448 Available from https://proceedings.mlr.press/v206/patil23a.html.

Related Material