A Deeper Look at Planning as Learning from Replay

Harm Vanseijen; Rich Sutton

A Deeper Look at Planning as Learning from Replay

Harm Vanseijen, Rich Sutton

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2314-2322, 2015.

Abstract

In reinforcement learning, the notions of experience replay, and of planning as learning from replayed experience, have long been used to find good policies with minimal training data. Replay can be seen either as model-based reinforcement learning, where the store of past experiences serves as the model, or as a way to avoid a conventional model of the environment altogether. In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods. First, we show for the first time an exact equivalence between the sequence of value functions found by a model-based policy-evaluation method and by a model-free method with replay. Second, we present a general replay method that can mimic a spectrum of methods ranging from the explicitly model-free (TD(0)) to the explicitly model-based (linear Dyna). Finally, we use insights gained from these relationships to design a new model-based reinforcement learning algorithm for linear function approximation. This method, which we call forgetful LSTD(lambda), improves upon regular LSTD(lambda) because it extends more naturally to online control, and improves upon linear Dyna because it is a multi-step method, enabling it to perform well even in non-Markov problems or, equivalently, in problems with significant function approximation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-vanseijen15,
  title = 	 {A Deeper Look at Planning as Learning from Replay},
  author = 	 {Vanseijen, Harm and Sutton, Rich},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {2314--2322},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/vanseijen15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/vanseijen15.html},
  abstract = 	 {In reinforcement learning, the notions of experience replay, and of planning as learning from replayed experience, have long been used to find good policies with minimal training data. Replay can be seen either as model-based reinforcement learning, where the store of past experiences serves as the model, or as a way to avoid a conventional model of the environment altogether. In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods. First, we show for the first time an exact equivalence between the sequence of value functions found by a model-based policy-evaluation method and by a model-free method with replay. Second, we present a general replay method that can mimic a spectrum of methods ranging from the explicitly model-free (TD(0)) to the explicitly model-based (linear Dyna). Finally, we use insights gained from these relationships to design a new model-based reinforcement learning algorithm for linear function approximation. This method, which we call forgetful LSTD(lambda), improves upon regular LSTD(lambda) because it extends more naturally to online control, and improves upon linear Dyna because it is a multi-step method, enabling it to perform well even in non-Markov problems or, equivalently, in problems with significant function approximation.}
}

Endnote

%0 Conference Paper
%T A Deeper Look at Planning as Learning from Replay
%A Harm Vanseijen
%A Rich Sutton
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-vanseijen15
%I PMLR
%P 2314--2322
%U https://proceedings.mlr.press/v37/vanseijen15.html
%V 37
%X In reinforcement learning, the notions of experience replay, and of planning as learning from replayed experience, have long been used to find good policies with minimal training data. Replay can be seen either as model-based reinforcement learning, where the store of past experiences serves as the model, or as a way to avoid a conventional model of the environment altogether. In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods. First, we show for the first time an exact equivalence between the sequence of value functions found by a model-based policy-evaluation method and by a model-free method with replay. Second, we present a general replay method that can mimic a spectrum of methods ranging from the explicitly model-free (TD(0)) to the explicitly model-based (linear Dyna). Finally, we use insights gained from these relationships to design a new model-based reinforcement learning algorithm for linear function approximation. This method, which we call forgetful LSTD(lambda), improves upon regular LSTD(lambda) because it extends more naturally to online control, and improves upon linear Dyna because it is a multi-step method, enabling it to perform well even in non-Markov problems or, equivalently, in problems with significant function approximation.

RIS


TY  - CPAPER
TI  - A Deeper Look at Planning as Learning from Replay
AU  - Harm Vanseijen
AU  - Rich Sutton
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-vanseijen15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 2314
EP  - 2322
L1  - http://proceedings.mlr.press/v37/vanseijen15.pdf
UR  - https://proceedings.mlr.press/v37/vanseijen15.html
AB  - In reinforcement learning, the notions of experience replay, and of planning as learning from replayed experience, have long been used to find good policies with minimal training data. Replay can be seen either as model-based reinforcement learning, where the store of past experiences serves as the model, or as a way to avoid a conventional model of the environment altogether. In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods. First, we show for the first time an exact equivalence between the sequence of value functions found by a model-based policy-evaluation method and by a model-free method with replay. Second, we present a general replay method that can mimic a spectrum of methods ranging from the explicitly model-free (TD(0)) to the explicitly model-based (linear Dyna). Finally, we use insights gained from these relationships to design a new model-based reinforcement learning algorithm for linear function approximation. This method, which we call forgetful LSTD(lambda), improves upon regular LSTD(lambda) because it extends more naturally to online control, and improves upon linear Dyna because it is a multi-step method, enabling it to perform well even in non-Markov problems or, equivalently, in problems with significant function approximation.
ER  -

APA


Vanseijen, H. & Sutton, R.. (2015). A Deeper Look at Planning as Learning from Replay. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2314-2322 Available from https://proceedings.mlr.press/v37/vanseijen15.html.

Related Material

Download PDF