A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Jalaj Bhandari, Daniel Russo, Raghav Singal
Proceedings of the 31st Conference On Learning Theory, PMLR 75:1691-1692, 2018.

Abstract

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a \emph{simple and explicit finite time analysis} of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to the study of a variant of Q-learning applied to optimal stopping problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-bhandari18a, title = {A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation}, author = {Bhandari, Jalaj and Russo, Daniel and Singal, Raghav}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {1691--1692}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/bhandari18a/bhandari18a.pdf}, url = {https://proceedings.mlr.press/v75/bhandari18a.html}, abstract = {Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a \emph{simple and explicit finite time analysis} of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to the study of a variant of Q-learning applied to optimal stopping problems.} }
Endnote
%0 Conference Paper %T A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation %A Jalaj Bhandari %A Daniel Russo %A Raghav Singal %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-bhandari18a %I PMLR %P 1691--1692 %U https://proceedings.mlr.press/v75/bhandari18a.html %V 75 %X Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a \emph{simple and explicit finite time analysis} of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to the study of a variant of Q-learning applied to optimal stopping problems.
APA
Bhandari, J., Russo, D. & Singal, R.. (2018). A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1691-1692 Available from https://proceedings.mlr.press/v75/bhandari18a.html.

Related Material