[edit]
Entity Abstraction in Visual Model-Based Reinforcement Learning
Proceedings of the Conference on Robot Learning, PMLR 100:1439-1456, 2020.
Abstract
We present OP3, a framework for model-based reinforcement learning that acquires object representations from raw visual observations without supervision and uses them to predict and plan. To ground these abstract representations of entities to actual objects in the world, we formulate an interactive inference algorithm which incorporates dynamic information in the scene. Our model can handle a variable number of entities by symmetrically processing each object representation with the same locally-scoped function. On block-stacking tasks, OP3 can generalize to novel block configurations and more objects than seen during training, outperforming both a model that assumes access to object supervision and a state-of-the-art video prediction model.