DeepMDP: Learning Continuous Latent Space Models for Representation Learning
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:21702179, 2019.
Abstract
Many reinforcement learning (RL) tasks provide the agent with highdimensional observations that can be simplified into lowdimensional continuous states. To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the embedding function as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying highdimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over modelfree RL.
Related Material


