VideoDex: Learning Dexterity from Internet Videos

Kenneth Shaw, Shikhar Bahl, Deepak Pathak
Proceedings of The 6th Conference on Robot Learning, PMLR 205:654-665, 2023.

Abstract

To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time and hardware restrictions. We thus propose leveraging the next best thing as real world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, Videodex, that leverages visual, action and physical priors from human video datasets to guide robot behavior. These action and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand based system and show strong results on many different manipulation tasks, outperforming various state-of-the-art methods. For videos and supplemental material visit our website at https://video-dex.github.io.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-shaw23a, title = {VideoDex: Learning Dexterity from Internet Videos}, author = {Shaw, Kenneth and Bahl, Shikhar and Pathak, Deepak}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {654--665}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/shaw23a/shaw23a.pdf}, url = {https://proceedings.mlr.press/v205/shaw23a.html}, abstract = {To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time and hardware restrictions. We thus propose leveraging the next best thing as real world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, Videodex, that leverages visual, action and physical priors from human video datasets to guide robot behavior. These action and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand based system and show strong results on many different manipulation tasks, outperforming various state-of-the-art methods. For videos and supplemental material visit our website at https://video-dex.github.io.} }
Endnote
%0 Conference Paper %T VideoDex: Learning Dexterity from Internet Videos %A Kenneth Shaw %A Shikhar Bahl %A Deepak Pathak %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-shaw23a %I PMLR %P 654--665 %U https://proceedings.mlr.press/v205/shaw23a.html %V 205 %X To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time and hardware restrictions. We thus propose leveraging the next best thing as real world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, Videodex, that leverages visual, action and physical priors from human video datasets to guide robot behavior. These action and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand based system and show strong results on many different manipulation tasks, outperforming various state-of-the-art methods. For videos and supplemental material visit our website at https://video-dex.github.io.
APA
Shaw, K., Bahl, S. & Pathak, D.. (2023). VideoDex: Learning Dexterity from Internet Videos. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:654-665 Available from https://proceedings.mlr.press/v205/shaw23a.html.

Related Material