Quasi Newton Temporal Difference Learning
[edit]
Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:159172, 2015.
Abstract
Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rate but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses the secondorder information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with Quasi Newton algorithm SGDQN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on 4 standard benchmarks in reinforcement learning literature with 2 small problems, Random Walk and Boyan chain and 2 bigger problems, cartpole and linkedpole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.
Related Material



