Dyna-style planning with linear function approximation and prioritized sweeping

Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, Michael Bowling
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, PMLR R6:528-536, 2008.

Abstract

We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dyna-style planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR6-sutton08a, title = {Dyna-style planning with linear function approximation and prioritized sweeping}, author = {Sutton, Richard S. and Szepesv\'{a}ri, Csaba and Geramifard, Alborz and Bowling, Michael}, booktitle = {Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence}, pages = {528--536}, year = {2008}, editor = {McAllester, David A. and Myllymäki, Petri}, volume = {R6}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/r6/main/assets/sutton08a/sutton08a.pdf}, url = {https://proceedings.mlr.press/r6/sutton08a.html}, abstract = {We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dyna-style planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems.}, note = {Reissued by PMLR on 09 October 2024.} }
Endnote
%0 Conference Paper %T Dyna-style planning with linear function approximation and prioritized sweeping %A Richard S. Sutton %A Csaba Szepesvári %A Alborz Geramifard %A Michael Bowling %B Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2008 %E David A. McAllester %E Petri Myllymäki %F pmlr-vR6-sutton08a %I PMLR %P 528--536 %U https://proceedings.mlr.press/r6/sutton08a.html %V R6 %X We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dyna-style planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems. %Z Reissued by PMLR on 09 October 2024.
APA
Sutton, R.S., Szepesvári, C., Geramifard, A. & Bowling, M.. (2008). Dyna-style planning with linear function approximation and prioritized sweeping. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research R6:528-536 Available from https://proceedings.mlr.press/r6/sutton08a.html. Reissued by PMLR on 09 October 2024.

Related Material