Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning

Karl Pertsch, Oleh Rybkin, Jingyun Yang, Shenghao Zhou, Konstantinos Derpanis, Kostas Daniilidis, Joseph Lim, Andrew Jaegle
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:969-979, 2020.

Abstract

To flexibly and efficiently reason about dynamics of temporal sequences, abstract representations that compactly represent the important information in the sequence are needed. One way of constructing such representations is by focusing on the important events in a sequence. In this paper, we propose a model that learns both to discover such key events (or keyframes) as well as to represent the sequence in terms of them. We do so using a hierarchical Keyframe-Inpainter (KeyIn) model that first generates keyframes and their temporal placement and then inpaints the sequences between keyframes. We propose a fully differentiable formulation for efficiently learning the keyframe placement. We show that KeyIn finds informative keyframes in several datasets with diverse dynamics. When evaluated on a planning task, KeyIn outperforms other recent proposals for learning hierarchical representations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v120-pertsch20a, title = {Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning}, author = {Pertsch, Karl and Rybkin, Oleh and Yang, Jingyun and Zhou, Shenghao and Derpanis, Konstantinos and Daniilidis, Kostas and Lim, Joseph and Jaegle, Andrew}, booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control}, pages = {969--979}, year = {2020}, editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie}, volume = {120}, series = {Proceedings of Machine Learning Research}, month = {10--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v120/pertsch20a/pertsch20a.pdf}, url = {https://proceedings.mlr.press/v120/pertsch20a.html}, abstract = {To flexibly and efficiently reason about dynamics of temporal sequences, abstract representations that compactly represent the important information in the sequence are needed. One way of constructing such representations is by focusing on the important events in a sequence. In this paper, we propose a model that learns both to discover such key events (or keyframes) as well as to represent the sequence in terms of them. We do so using a hierarchical Keyframe-Inpainter (KeyIn) model that first generates keyframes and their temporal placement and then inpaints the sequences between keyframes. We propose a fully differentiable formulation for efficiently learning the keyframe placement. We show that KeyIn finds informative keyframes in several datasets with diverse dynamics. When evaluated on a planning task, KeyIn outperforms other recent proposals for learning hierarchical representations.} }
Endnote
%0 Conference Paper %T Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning %A Karl Pertsch %A Oleh Rybkin %A Jingyun Yang %A Shenghao Zhou %A Konstantinos Derpanis %A Kostas Daniilidis %A Joseph Lim %A Andrew Jaegle %B Proceedings of the 2nd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2020 %E Alexandre M. Bayen %E Ali Jadbabaie %E George Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire Tomlin %E Melanie Zeilinger %F pmlr-v120-pertsch20a %I PMLR %P 969--979 %U https://proceedings.mlr.press/v120/pertsch20a.html %V 120 %X To flexibly and efficiently reason about dynamics of temporal sequences, abstract representations that compactly represent the important information in the sequence are needed. One way of constructing such representations is by focusing on the important events in a sequence. In this paper, we propose a model that learns both to discover such key events (or keyframes) as well as to represent the sequence in terms of them. We do so using a hierarchical Keyframe-Inpainter (KeyIn) model that first generates keyframes and their temporal placement and then inpaints the sequences between keyframes. We propose a fully differentiable formulation for efficiently learning the keyframe placement. We show that KeyIn finds informative keyframes in several datasets with diverse dynamics. When evaluated on a planning task, KeyIn outperforms other recent proposals for learning hierarchical representations.
APA
Pertsch, K., Rybkin, O., Yang, J., Zhou, S., Derpanis, K., Daniilidis, K., Lim, J. & Jaegle, A.. (2020). Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:969-979 Available from https://proceedings.mlr.press/v120/pertsch20a.html.

Related Material