On the Rate of Convergence and Error Bounds for LSTD(λ)

Manel Tagorti, Bruno Scherrer
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1521-1529, 2015.

Abstract

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-tagorti15, title = {On the Rate of Convergence and Error Bounds for LSTD($\lambda$)}, author = {Tagorti, Manel and Scherrer, Bruno}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {1521--1529}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/tagorti15.pdf}, url = {https://proceedings.mlr.press/v37/tagorti15.html}, abstract = {We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.} }
Endnote
%0 Conference Paper %T On the Rate of Convergence and Error Bounds for LSTD(λ) %A Manel Tagorti %A Bruno Scherrer %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-tagorti15 %I PMLR %P 1521--1529 %U https://proceedings.mlr.press/v37/tagorti15.html %V 37 %X We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.
RIS
TY - CPAPER TI - On the Rate of Convergence and Error Bounds for LSTD(λ) AU - Manel Tagorti AU - Bruno Scherrer BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-tagorti15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 1521 EP - 1529 L1 - http://proceedings.mlr.press/v37/tagorti15.pdf UR - https://proceedings.mlr.press/v37/tagorti15.html AB - We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples. ER -
APA
Tagorti, M. & Scherrer, B.. (2015). On the Rate of Convergence and Error Bounds for LSTD(λ). Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1521-1529 Available from https://proceedings.mlr.press/v37/tagorti15.html.

Related Material