Dual Temporal Difference Learning

Min Yang, Yuxi Li, Dale Schuurmans
; Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:631-638, 2009.

Abstract

Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of temporal difference learning using dual representations. We contribute significant progress by proving the convergence of dual temporal difference learning with eligibility traces. Experimental results suggest that the dual algorithms seem to demonstrate empirical benefits over standard primal algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v5-yang09a, title = {Dual Temporal Difference Learning}, author = {Min Yang and Yuxi Li and Dale Schuurmans}, booktitle = {Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics}, pages = {631--638}, year = {2009}, editor = {David van Dyk and Max Welling}, volume = {5}, series = {Proceedings of Machine Learning Research}, address = {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v5/yang09a/yang09a.pdf}, url = {http://proceedings.mlr.press/v5/yang09a.html}, abstract = {Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of temporal difference learning using dual representations. We contribute significant progress by proving the convergence of dual temporal difference learning with eligibility traces. Experimental results suggest that the dual algorithms seem to demonstrate empirical benefits over standard primal algorithms.} }
Endnote
%0 Conference Paper %T Dual Temporal Difference Learning %A Min Yang %A Yuxi Li %A Dale Schuurmans %B Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2009 %E David van Dyk %E Max Welling %F pmlr-v5-yang09a %I PMLR %J Proceedings of Machine Learning Research %P 631--638 %U http://proceedings.mlr.press %V 5 %W PMLR %X Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of temporal difference learning using dual representations. We contribute significant progress by proving the convergence of dual temporal difference learning with eligibility traces. Experimental results suggest that the dual algorithms seem to demonstrate empirical benefits over standard primal algorithms.
RIS
TY - CPAPER TI - Dual Temporal Difference Learning AU - Min Yang AU - Yuxi Li AU - Dale Schuurmans BT - Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics PY - 2009/04/15 DA - 2009/04/15 ED - David van Dyk ED - Max Welling ID - pmlr-v5-yang09a PB - PMLR SP - 631 DP - PMLR EP - 638 L1 - http://proceedings.mlr.press/v5/yang09a/yang09a.pdf UR - http://proceedings.mlr.press/v5/yang09a.html AB - Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of temporal difference learning using dual representations. We contribute significant progress by proving the convergence of dual temporal difference learning with eligibility traces. Experimental results suggest that the dual algorithms seem to demonstrate empirical benefits over standard primal algorithms. ER -
APA
Yang, M., Li, Y. & Schuurmans, D.. (2009). Dual Temporal Difference Learning. Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, in PMLR 5:631-638

Related Material