Learning Object Manipulation Skills via Approximate State Estimation from Real Videos

Vladimír Petrík, Makarand Tapaswi, Ivan Laptev, Josef Sivic
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:296-312, 2021.

Abstract

Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos. Leveraging recent advances in 2D visual recognition and differentiable rendering, we develop an optimization based method to estimate a coarse 3D state representation for the hand and the manipulated object(s) without requiring any supervision. We use these trajectories as dense rewards for an agent that learns to mimic them through reinforcement learning. We evaluate our method on simple single- and two-object actions from the Something-Something dataset. Our approach allows an agent to learn actions from single videos, while watching multiple demonstrations makes the policy more robust. We show that policies learned in a simulated environment can be easily transferred to a real robot.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-petrik21a, title = {Learning Object Manipulation Skills via Approximate State Estimation from Real Videos}, author = {Petr\'{i}k, Vladim\'{i}r and Tapaswi, Makarand and Laptev, Ivan and Sivic, Josef}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {296--312}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/petrik21a/petrik21a.pdf}, url = {https://proceedings.mlr.press/v155/petrik21a.html}, abstract = {Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos. Leveraging recent advances in 2D visual recognition and differentiable rendering, we develop an optimization based method to estimate a coarse 3D state representation for the hand and the manipulated object(s) without requiring any supervision. We use these trajectories as dense rewards for an agent that learns to mimic them through reinforcement learning. We evaluate our method on simple single- and two-object actions from the Something-Something dataset. Our approach allows an agent to learn actions from single videos, while watching multiple demonstrations makes the policy more robust. We show that policies learned in a simulated environment can be easily transferred to a real robot.} }
Endnote
%0 Conference Paper %T Learning Object Manipulation Skills via Approximate State Estimation from Real Videos %A Vladimír Petrík %A Makarand Tapaswi %A Ivan Laptev %A Josef Sivic %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-petrik21a %I PMLR %P 296--312 %U https://proceedings.mlr.press/v155/petrik21a.html %V 155 %X Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos. Leveraging recent advances in 2D visual recognition and differentiable rendering, we develop an optimization based method to estimate a coarse 3D state representation for the hand and the manipulated object(s) without requiring any supervision. We use these trajectories as dense rewards for an agent that learns to mimic them through reinforcement learning. We evaluate our method on simple single- and two-object actions from the Something-Something dataset. Our approach allows an agent to learn actions from single videos, while watching multiple demonstrations makes the policy more robust. We show that policies learned in a simulated environment can be easily transferred to a real robot.
APA
Petrík, V., Tapaswi, M., Laptev, I. & Sivic, J.. (2021). Learning Object Manipulation Skills via Approximate State Estimation from Real Videos. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:296-312 Available from https://proceedings.mlr.press/v155/petrik21a.html.

Related Material