Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

Bryan Chen, Alexander Sax, Francis Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:2328-2346, 2021.

Abstract

Vision-based robotics often factors the control loop into separate components for perception and control. Conventional perception components usually extract hand-engineered features from the visual input that are then used by the control component in an explicit manner. In contrast, recent advances in deep RL make it possible to learn these features end-to-end during training, but the final result is often brittle, fails unexpectedly under minuscule visual shifts, and comes with a high sample complexity cost. In this work, we study the effects of using mid-level visual representations asynchronously trained for traditional computer vision objectives as a generic and easy-to-decode perceptual state in an end-to-end RL framework. We show that the invariances provided by the mid-level representations aid generalization, improve sample complexity, and lead to a higher final performance. Compared to the alternative approaches for incorporating invariances, such as domain randomization, using asynchronously trained mid-level representations scale better to harder problems and larger domain shifts, and consequently, successfully trains policies for tasks where domain randomization or learning-from-scratch failed. Our experimental findings are reported on manipulation and navigation tasks using real robots as well as simulations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-chen21f, title = {Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation}, author = {Chen, Bryan and Sax, Alexander and Lewis, Francis and Armeni, Iro and Savarese, Silvio and Zamir, Amir and Malik, Jitendra and Pinto, Lerrel}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {2328--2346}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/chen21f/chen21f.pdf}, url = {https://proceedings.mlr.press/v155/chen21f.html}, abstract = {Vision-based robotics often factors the control loop into separate components for perception and control. Conventional perception components usually extract hand-engineered features from the visual input that are then used by the control component in an explicit manner. In contrast, recent advances in deep RL make it possible to learn these features end-to-end during training, but the final result is often brittle, fails unexpectedly under minuscule visual shifts, and comes with a high sample complexity cost. In this work, we study the effects of using mid-level visual representations asynchronously trained for traditional computer vision objectives as a generic and easy-to-decode perceptual state in an end-to-end RL framework. We show that the invariances provided by the mid-level representations aid generalization, improve sample complexity, and lead to a higher final performance. Compared to the alternative approaches for incorporating invariances, such as domain randomization, using asynchronously trained mid-level representations scale better to harder problems and larger domain shifts, and consequently, successfully trains policies for tasks where domain randomization or learning-from-scratch failed. Our experimental findings are reported on manipulation and navigation tasks using real robots as well as simulations.} }
Endnote
%0 Conference Paper %T Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation %A Bryan Chen %A Alexander Sax %A Francis Lewis %A Iro Armeni %A Silvio Savarese %A Amir Zamir %A Jitendra Malik %A Lerrel Pinto %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-chen21f %I PMLR %P 2328--2346 %U https://proceedings.mlr.press/v155/chen21f.html %V 155 %X Vision-based robotics often factors the control loop into separate components for perception and control. Conventional perception components usually extract hand-engineered features from the visual input that are then used by the control component in an explicit manner. In contrast, recent advances in deep RL make it possible to learn these features end-to-end during training, but the final result is often brittle, fails unexpectedly under minuscule visual shifts, and comes with a high sample complexity cost. In this work, we study the effects of using mid-level visual representations asynchronously trained for traditional computer vision objectives as a generic and easy-to-decode perceptual state in an end-to-end RL framework. We show that the invariances provided by the mid-level representations aid generalization, improve sample complexity, and lead to a higher final performance. Compared to the alternative approaches for incorporating invariances, such as domain randomization, using asynchronously trained mid-level representations scale better to harder problems and larger domain shifts, and consequently, successfully trains policies for tasks where domain randomization or learning-from-scratch failed. Our experimental findings are reported on manipulation and navigation tasks using real robots as well as simulations.
APA
Chen, B., Sax, A., Lewis, F., Armeni, I., Savarese, S., Zamir, A., Malik, J. & Pinto, L.. (2021). Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:2328-2346 Available from https://proceedings.mlr.press/v155/chen21f.html.

Related Material