An Investigation of Model-Free Planning

Arthur Guez; Mehdi Mirza; Karol Gregor; Rishabh Kabra; Sebastien Racaniere; Theophane Weber; David Raposo; Adam Santoro; Laurent Orseau; Tom Eccles; Greg Wayne; David Silver; Timothy Lillicrap

An Investigation of Model-Free Planning

Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sebastien Racaniere, Theophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2464-2473, 2019.

Abstract

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent’s effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-guez19a,
  title = 	 {An Investigation of Model-Free Planning},
  author =       {Guez, Arthur and Mirza, Mehdi and Gregor, Karol and Kabra, Rishabh and Racaniere, Sebastien and Weber, Theophane and Raposo, David and Santoro, Adam and Orseau, Laurent and Eccles, Tom and Wayne, Greg and Silver, David and Lillicrap, Timothy},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2464--2473},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/guez19a/guez19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/guez19a.html},
  abstract = 	 {The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent’s effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.}
}

Endnote

%0 Conference Paper
%T An Investigation of Model-Free Planning
%A Arthur Guez
%A Mehdi Mirza
%A Karol Gregor
%A Rishabh Kabra
%A Sebastien Racaniere
%A Theophane Weber
%A David Raposo
%A Adam Santoro
%A Laurent Orseau
%A Tom Eccles
%A Greg Wayne
%A David Silver
%A Timothy Lillicrap
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-guez19a
%I PMLR
%P 2464--2473
%U https://proceedings.mlr.press/v97/guez19a.html
%V 97
%X The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent’s effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

APA

Guez, A., Mirza, M., Gregor, K., Kabra, R., Racaniere, S., Weber, T., Raposo, D., Santoro, A., Orseau, L., Eccles, T., Wayne, G., Silver, D. & Lillicrap, T.. (2019). An Investigation of Model-Free Planning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2464-2473 Available from https://proceedings.mlr.press/v97/guez19a.html.

An Investigation of Model-Free Planning

Abstract

Cite this Paper

Related Material