Visual Grounding of Learned Physical Models

Yunzhu Li, Toru Lin, Kexin Yi, Daniel Bear, Daniel Yamins, Jiajun Wu, Joshua Tenenbaum, Antonio Torralba
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:5927-5936, 2020.

Abstract

Humans intuitively recognize objects’ physical properties and predict their motion, even when the objects are engaged in complicated interactions. The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models. In this work, we present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. The visual prior predicts a particle-based representation of the system from visual observations. An inference module operates on those particles, predicting and refining estimates of particle locations, object states, and physical parameters, subject to the constraints imposed by the dynamics prior, which we refer to as visual grounding. We demonstrate the effectiveness of our method in environments involving rigid objects, deformable materials, and fluids. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-li20j, title = {Visual Grounding of Learned Physical Models}, author = {Li, Yunzhu and Lin, Toru and Yi, Kexin and Bear, Daniel and Yamins, Daniel and Wu, Jiajun and Tenenbaum, Joshua and Torralba, Antonio}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {5927--5936}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/li20j/li20j.pdf}, url = { http://proceedings.mlr.press/v119/li20j.html }, abstract = {Humans intuitively recognize objects’ physical properties and predict their motion, even when the objects are engaged in complicated interactions. The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models. In this work, we present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. The visual prior predicts a particle-based representation of the system from visual observations. An inference module operates on those particles, predicting and refining estimates of particle locations, object states, and physical parameters, subject to the constraints imposed by the dynamics prior, which we refer to as visual grounding. We demonstrate the effectiveness of our method in environments involving rigid objects, deformable materials, and fluids. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.} }
Endnote
%0 Conference Paper %T Visual Grounding of Learned Physical Models %A Yunzhu Li %A Toru Lin %A Kexin Yi %A Daniel Bear %A Daniel Yamins %A Jiajun Wu %A Joshua Tenenbaum %A Antonio Torralba %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-li20j %I PMLR %P 5927--5936 %U http://proceedings.mlr.press/v119/li20j.html %V 119 %X Humans intuitively recognize objects’ physical properties and predict their motion, even when the objects are engaged in complicated interactions. The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models. In this work, we present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. The visual prior predicts a particle-based representation of the system from visual observations. An inference module operates on those particles, predicting and refining estimates of particle locations, object states, and physical parameters, subject to the constraints imposed by the dynamics prior, which we refer to as visual grounding. We demonstrate the effectiveness of our method in environments involving rigid objects, deformable materials, and fluids. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
APA
Li, Y., Lin, T., Yi, K., Bear, D., Yamins, D., Wu, J., Tenenbaum, J. & Torralba, A.. (2020). Visual Grounding of Learned Physical Models. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:5927-5936 Available from http://proceedings.mlr.press/v119/li20j.html .

Related Material