On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

Nicklas Hansen; Zhecheng Yuan; Yanjie Ze; Tongzhou Mu; Aravind Rajeswaran; Hao Su; Huazhe Xu; Xiaolong Wang

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:12511-12526, 2023.

Abstract

In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets – across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area. Code: https://github.com/gemcollector/learning-from-scratch.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-hansen23c,
  title = 	 {On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline},
  author =       {Hansen, Nicklas and Yuan, Zhecheng and Ze, Yanjie and Mu, Tongzhou and Rajeswaran, Aravind and Su, Hao and Xu, Huazhe and Wang, Xiaolong},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {12511--12526},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/hansen23c/hansen23c.pdf},
  url = 	 {https://proceedings.mlr.press/v202/hansen23c.html},
  abstract = 	 {In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets – across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area. Code: https://github.com/gemcollector/learning-from-scratch.}
}

Endnote

%0 Conference Paper
%T On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
%A Nicklas Hansen
%A Zhecheng Yuan
%A Yanjie Ze
%A Tongzhou Mu
%A Aravind Rajeswaran
%A Hao Su
%A Huazhe Xu
%A Xiaolong Wang
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-hansen23c
%I PMLR
%P 12511--12526
%U https://proceedings.mlr.press/v202/hansen23c.html
%V 202
%X In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets – across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area. Code: https://github.com/gemcollector/learning-from-scratch.

APA


Hansen, N., Yuan, Z., Ze, Y., Mu, T., Rajeswaran, A., Su, H., Xu, H. & Wang, X.. (2023). On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:12511-12526 Available from https://proceedings.mlr.press/v202/hansen23c.html.

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

Abstract

Cite this Paper

Related Material