Experience-Embedded Visual Foresight

Lin Yen-Chen, Maria Bauza, Phillip Isola
Proceedings of the Conference on Robot Learning, PMLR 100:1015-1024, 2020.

Abstract

Visual foresight gives an agent a window into the future, which it can use to anticipate events before they happen and plan strategic behavior. Although impressive results have been achieved on video prediction in constrained settings, these models fail to generalize when confronted with unfamiliar real-world objects. In this paper, we tackle the generalization problem via fast adaptation, where we train a prediction model to quickly adapt to the observed visual dynamics of a novel object. Our method, Experience-embedded Visual Foresight (EVF), jointly learns a fast adaptation module, which encodes observed trajectories of the new object into a vector embedding, and a visual prediction model, which conditions on this embedding to generate physically plausible predictions. For evaluation, we compare our method against baselines on video prediction and benchmark its utility on two real world control tasks. We show that our method is able to quickly adapt to new visual dynamics and achieves lower error than the baselines when manipulating novel objects. Videos are available at: http://evf.csail.mit.edu/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-yen-chen20a, title = {Experience-Embedded Visual Foresight}, author = {Yen-Chen, Lin and Bauza, Maria and Isola, Phillip}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {1015--1024}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/yen-chen20a/yen-chen20a.pdf}, url = {https://proceedings.mlr.press/v100/yen-chen20a.html}, abstract = {Visual foresight gives an agent a window into the future, which it can use to anticipate events before they happen and plan strategic behavior. Although impressive results have been achieved on video prediction in constrained settings, these models fail to generalize when confronted with unfamiliar real-world objects. In this paper, we tackle the generalization problem via fast adaptation, where we train a prediction model to quickly adapt to the observed visual dynamics of a novel object. Our method, Experience-embedded Visual Foresight (EVF), jointly learns a fast adaptation module, which encodes observed trajectories of the new object into a vector embedding, and a visual prediction model, which conditions on this embedding to generate physically plausible predictions. For evaluation, we compare our method against baselines on video prediction and benchmark its utility on two real world control tasks. We show that our method is able to quickly adapt to new visual dynamics and achieves lower error than the baselines when manipulating novel objects. Videos are available at: http://evf.csail.mit.edu/.} }
Endnote
%0 Conference Paper %T Experience-Embedded Visual Foresight %A Lin Yen-Chen %A Maria Bauza %A Phillip Isola %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-yen-chen20a %I PMLR %P 1015--1024 %U https://proceedings.mlr.press/v100/yen-chen20a.html %V 100 %X Visual foresight gives an agent a window into the future, which it can use to anticipate events before they happen and plan strategic behavior. Although impressive results have been achieved on video prediction in constrained settings, these models fail to generalize when confronted with unfamiliar real-world objects. In this paper, we tackle the generalization problem via fast adaptation, where we train a prediction model to quickly adapt to the observed visual dynamics of a novel object. Our method, Experience-embedded Visual Foresight (EVF), jointly learns a fast adaptation module, which encodes observed trajectories of the new object into a vector embedding, and a visual prediction model, which conditions on this embedding to generate physically plausible predictions. For evaluation, we compare our method against baselines on video prediction and benchmark its utility on two real world control tasks. We show that our method is able to quickly adapt to new visual dynamics and achieves lower error than the baselines when manipulating novel objects. Videos are available at: http://evf.csail.mit.edu/.
APA
Yen-Chen, L., Bauza, M. & Isola, P.. (2020). Experience-Embedded Visual Foresight. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:1015-1024 Available from https://proceedings.mlr.press/v100/yen-chen20a.html.

Related Material