DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Carles Gelada; Saurabh Kumar; Jacob Buckman; Ofir Nachum; Marc G. Bellemare

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2170-2179, 2019.

Abstract

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the embedding function as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-gelada19a,
  title = 	 {{D}eep{MDP}: Learning Continuous Latent Space Models for Representation Learning},
  author =       {Gelada, Carles and Kumar, Saurabh and Buckman, Jacob and Nachum, Ofir and Bellemare, Marc G.},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2170--2179},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/gelada19a/gelada19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/gelada19a.html},
  abstract = 	 {Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the embedding function as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.}
}

Endnote

%0 Conference Paper
%T DeepMDP: Learning Continuous Latent Space Models for Representation Learning
%A Carles Gelada
%A Saurabh Kumar
%A Jacob Buckman
%A Ofir Nachum
%A Marc G. Bellemare
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-gelada19a
%I PMLR
%P 2170--2179
%U https://proceedings.mlr.press/v97/gelada19a.html
%V 97
%X Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the embedding function as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

APA


Gelada, C., Kumar, S., Buckman, J., Nachum, O. & Bellemare, M.G.. (2019). DeepMDP: Learning Continuous Latent Space Models for Representation Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2170-2179 Available from https://proceedings.mlr.press/v97/gelada19a.html.

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Abstract

Cite this Paper

Related Material