A finite-sample analysis of multi-step temporal difference estimates

Yaqi Duan; Martin J. Wainwright

A finite-sample analysis of multi-step temporal difference estimates

Yaqi Duan, Martin J. Wainwright

Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:612-624, 2023.

Abstract

We consider the problem of estimating the value function of an infinite-horizon

$\gamma$ -discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a general family of multi-step temporal difference (TD) estimates, including canonical

$K$ -step look-ahead TD for

$K = 1, 2, \ldots$ and the TD

$(\lambda)$ family for

$\lambda \in [0,1)$ as special cases. Our bounds capture the dependence of these estimates on both the variance as defined by Bellman fluctuations, and the bias arising from possible model mis-specification. Our results reveal that the variance component shows limited sensitivity to the choice of look-ahead defining the estimator itself, while increasing the look-ahead can reduce the bias term. This highlights the benefit of using a larger look-ahead: it reduces bias but need not increase the variance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v211-duan23a,
  title = 	 {A finite-sample analysis of multi-step temporal difference estimates},
  author =       {Duan, Yaqi and Wainwright, Martin J.},
  booktitle = 	 {Proceedings of The 5th Annual Learning for Dynamics and Control Conference},
  pages = 	 {612--624},
  year = 	 {2023},
  editor = 	 {Matni, Nikolai and Morari, Manfred and Pappas, George J.},
  volume = 	 {211},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--16 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v211/duan23a/duan23a.pdf},
  url = 	 {https://proceedings.mlr.press/v211/duan23a.html},
  abstract = 	 {We consider the problem of estimating the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a general family of multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture the dependence of these estimates on both the variance as defined by Bellman fluctuations, and the bias arising from possible model mis-specification. Our results reveal that the variance component shows limited sensitivity to the choice of look-ahead defining the estimator itself, while increasing the look-ahead can reduce the bias term. This highlights the benefit of using a larger look-ahead: it reduces bias but need not increase the variance.}
}

Endnote

%0 Conference Paper
%T A finite-sample analysis of multi-step temporal difference estimates
%A Yaqi Duan
%A Martin J. Wainwright
%B Proceedings of The 5th Annual Learning for Dynamics and Control Conference
%C Proceedings of Machine Learning Research
%D 2023
%E Nikolai Matni
%E Manfred Morari
%E George J. Pappas	
%F pmlr-v211-duan23a
%I PMLR
%P 612--624
%U https://proceedings.mlr.press/v211/duan23a.html
%V 211
%X We consider the problem of estimating the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a general family of multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture the dependence of these estimates on both the variance as defined by Bellman fluctuations, and the bias arising from possible model mis-specification. Our results reveal that the variance component shows limited sensitivity to the choice of look-ahead defining the estimator itself, while increasing the look-ahead can reduce the bias term. This highlights the benefit of using a larger look-ahead: it reduces bias but need not increase the variance.

APA


Duan, Y. & Wainwright, M.J.. (2023). A finite-sample analysis of multi-step temporal difference estimates. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:612-624 Available from https://proceedings.mlr.press/v211/duan23a.html.

A finite-sample analysis of multi-step temporal difference estimates

Abstract

Cite this Paper

Related Material