Gradient Temporal Difference Networks


David Silver ;
Proceedings of the Tenth European Workshop on Reinforcement Learning, PMLR 24:117-130, 2013.


Temporal-difference (TD) networks (Sutton and Tanner, 2004) are a predictive represen- tation of state in which each node is an answer to a question about future observations or questions. Unfortunately, existing algorithms for learning TD networks are known to diverge, even in very simple problems. In this paper we present the first sound learning rule for TD networks. Our approach is to develop a true gradient descent algorithm that takes account of all three roles performed by each node in the network: as state, as an answer, and as a target for other questions. Our algorithm combines gradient temporal-difference learning (Maei et al., 2009) with real-time recurrent learning (Williams and Zipser, 1994). We provide a generalisation of the Bellman equation that corresponds to the semantics of the TD network, and prove that our algorithm converges to a fixed point of this equation.

Related Material