On the Rate of Convergence and Error Bounds for LSTD(λ)

Manel Tagorti; Bruno Scherrer

On the Rate of Convergence and Error Bounds for LSTD(λ)

Manel Tagorti, Bruno Scherrer

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1521-1529, 2015.

Abstract

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-tagorti15,
  title = 	 {On the Rate of Convergence and Error Bounds for LSTD($\lambda$)},
  author = 	 {Tagorti, Manel and Scherrer, Bruno},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1521--1529},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/tagorti15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/tagorti15.html},
  abstract = 	 {We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.}
}

Endnote

%0 Conference Paper
%T On the Rate of Convergence and Error Bounds for LSTD(λ)
%A Manel Tagorti
%A Bruno Scherrer
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-tagorti15
%I PMLR
%P 1521--1529
%U https://proceedings.mlr.press/v37/tagorti15.html
%V 37
%X We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.

RIS


TY  - CPAPER
TI  - On the Rate of Convergence and Error Bounds for LSTD(λ)
AU  - Manel Tagorti
AU  - Bruno Scherrer
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-tagorti15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1521
EP  - 1529
L1  - http://proceedings.mlr.press/v37/tagorti15.pdf
UR  - https://proceedings.mlr.press/v37/tagorti15.html
AB  - We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.
ER  -

APA


Tagorti, M. & Scherrer, B.. (2015). On the Rate of Convergence and Error Bounds for LSTD(λ). Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1521-1529 Available from https://proceedings.mlr.press/v37/tagorti15.html.

On the Rate of Convergence and Error Bounds for LSTD(λ)

Abstract

Cite this Paper

Related Material