Quasi Newton Temporal Difference Learning

Arash Givchi; Maziar Palhang

Quasi Newton Temporal Difference Learning

Arash Givchi, Maziar Palhang

Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:159-172, 2015.

Abstract

Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.

Cite this Paper

BibTeX


@InProceedings{pmlr-v39-givchi14,
  title = 	 {Quasi Newton Temporal Difference Learning},
  author = 	 {Givchi, Arash and Palhang, Maziar},
  booktitle = 	 {Proceedings of the Sixth Asian Conference on Machine Learning},
  pages = 	 {159--172},
  year = 	 {2015},
  editor = 	 {Phung, Dinh and Li, Hang},
  volume = 	 {39},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Nha Trang City, Vietnam},
  month = 	 {26--28 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v39/givchi14.pdf},
  url = 	 {https://proceedings.mlr.press/v39/givchi14.html},
  abstract = 	 {Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.}
}

Endnote

%0 Conference Paper
%T Quasi Newton Temporal Difference Learning
%A Arash Givchi
%A Maziar Palhang
%B Proceedings of the Sixth Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Dinh Phung
%E Hang Li	
%F pmlr-v39-givchi14
%I PMLR
%P 159--172
%U https://proceedings.mlr.press/v39/givchi14.html
%V 39
%X Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.

RIS


TY  - CPAPER
TI  - Quasi Newton Temporal Difference Learning
AU  - Arash Givchi
AU  - Maziar Palhang
BT  - Proceedings of the Sixth Asian Conference on Machine Learning
DA  - 2015/02/16
ED  - Dinh Phung
ED  - Hang Li	
ID  - pmlr-v39-givchi14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 39
SP  - 159
EP  - 172
L1  - http://proceedings.mlr.press/v39/givchi14.pdf
UR  - https://proceedings.mlr.press/v39/givchi14.html
AB  - Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.
ER  -

APA


Givchi, A. & Palhang, M.. (2015). Quasi Newton Temporal Difference Learning. Proceedings of the Sixth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 39:159-172 Available from https://proceedings.mlr.press/v39/givchi14.html.

Related Material

Download PDF