Quasi Newton Temporal Difference Learning

Arash Givchi, Maziar Palhang
Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:159-172, 2015.

Abstract

Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.

Cite this Paper


BibTeX
@InProceedings{pmlr-v39-givchi14, title = {Quasi Newton Temporal Difference Learning}, author = {Givchi, Arash and Palhang, Maziar}, booktitle = {Proceedings of the Sixth Asian Conference on Machine Learning}, pages = {159--172}, year = {2015}, editor = {Phung, Dinh and Li, Hang}, volume = {39}, series = {Proceedings of Machine Learning Research}, address = {Nha Trang City, Vietnam}, month = {26--28 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v39/givchi14.pdf}, url = {https://proceedings.mlr.press/v39/givchi14.html}, abstract = {Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.} }
Endnote
%0 Conference Paper %T Quasi Newton Temporal Difference Learning %A Arash Givchi %A Maziar Palhang %B Proceedings of the Sixth Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Dinh Phung %E Hang Li %F pmlr-v39-givchi14 %I PMLR %P 159--172 %U https://proceedings.mlr.press/v39/givchi14.html %V 39 %X Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.
RIS
TY - CPAPER TI - Quasi Newton Temporal Difference Learning AU - Arash Givchi AU - Maziar Palhang BT - Proceedings of the Sixth Asian Conference on Machine Learning DA - 2015/02/16 ED - Dinh Phung ED - Hang Li ID - pmlr-v39-givchi14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 39 SP - 159 EP - 172 L1 - http://proceedings.mlr.press/v39/givchi14.pdf UR - https://proceedings.mlr.press/v39/givchi14.html AB - Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD. ER -
APA
Givchi, A. & Palhang, M.. (2015). Quasi Newton Temporal Difference Learning. Proceedings of the Sixth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 39:159-172 Available from https://proceedings.mlr.press/v39/givchi14.html.

Related Material