Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

Archit Sharma, Ahmed M. Ahmed, Rehaan Ahmad, Chelsea Finn
Proceedings of The 7th Conference on Robot Learning, PMLR 229:3292-3308, 2023.

Abstract

In imitation and reinforcement learning (RL), the cost of human supervision limits the amount of data that the robots can be trained on. While RL offers a framework for building self-improving robots that can learn via trial-and-error autonomously, practical realizations end up requiring extensive human supervision for reward function design and repeated resetting of the environment between episodes of interactions. In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations. The policy and reward function are learned end-to-end from high-dimensional visual inputs, bypassing the need for explicit state estimation or task-specific pre-training for visual encoders used in prior work. We first evaluate our proposed system on a simulated non-episodic benchmark EARL, finding that MEDAL++ is both more data efficient and gets up to $30%$ better final performance compared to state-of-the-art vision-based methods. Our real-robot experiments show that MEDAL++ can be applied to manipulation problems in larger environments than those considered in prior work, and autonomous self-improvement can improve the success rate by $30%$ to $70%$ over behavioral cloning on just the expert data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-sharma23b, title = {Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning}, author = {Sharma, Archit and Ahmed, Ahmed M. and Ahmad, Rehaan and Finn, Chelsea}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {3292--3308}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/sharma23b/sharma23b.pdf}, url = {https://proceedings.mlr.press/v229/sharma23b.html}, abstract = {In imitation and reinforcement learning (RL), the cost of human supervision limits the amount of data that the robots can be trained on. While RL offers a framework for building self-improving robots that can learn via trial-and-error autonomously, practical realizations end up requiring extensive human supervision for reward function design and repeated resetting of the environment between episodes of interactions. In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations. The policy and reward function are learned end-to-end from high-dimensional visual inputs, bypassing the need for explicit state estimation or task-specific pre-training for visual encoders used in prior work. We first evaluate our proposed system on a simulated non-episodic benchmark EARL, finding that MEDAL++ is both more data efficient and gets up to $30%$ better final performance compared to state-of-the-art vision-based methods. Our real-robot experiments show that MEDAL++ can be applied to manipulation problems in larger environments than those considered in prior work, and autonomous self-improvement can improve the success rate by $30%$ to $70%$ over behavioral cloning on just the expert data.} }
Endnote
%0 Conference Paper %T Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning %A Archit Sharma %A Ahmed M. Ahmed %A Rehaan Ahmad %A Chelsea Finn %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-sharma23b %I PMLR %P 3292--3308 %U https://proceedings.mlr.press/v229/sharma23b.html %V 229 %X In imitation and reinforcement learning (RL), the cost of human supervision limits the amount of data that the robots can be trained on. While RL offers a framework for building self-improving robots that can learn via trial-and-error autonomously, practical realizations end up requiring extensive human supervision for reward function design and repeated resetting of the environment between episodes of interactions. In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations. The policy and reward function are learned end-to-end from high-dimensional visual inputs, bypassing the need for explicit state estimation or task-specific pre-training for visual encoders used in prior work. We first evaluate our proposed system on a simulated non-episodic benchmark EARL, finding that MEDAL++ is both more data efficient and gets up to $30%$ better final performance compared to state-of-the-art vision-based methods. Our real-robot experiments show that MEDAL++ can be applied to manipulation problems in larger environments than those considered in prior work, and autonomous self-improvement can improve the success rate by $30%$ to $70%$ over behavioral cloning on just the expert data.
APA
Sharma, A., Ahmed, A.M., Ahmad, R. & Finn, C.. (2023). Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:3292-3308 Available from https://proceedings.mlr.press/v229/sharma23b.html.

Related Material