Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Tianjian Chen, Zhanpeng He, Matei Ciocarlie
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:1158-1173, 2021.

Abstract

Deep Reinforcement Learning (RL) has shown great success in learning complex control policies for a variety of applications in robotics. However, in most such cases, the hardware of the robot has been considered immutable, modeled as part of the environment. In this study, we explore the problem of learning hardware and control parameters together in a unified RL framework. To achieve this, we propose to model the robot body as a “hardware policy”, analogous to and optimized jointly with its computational counterpart. We show that, by modeling such hardware policies as auto-differentiable computational graphs, the ensuing optimization problem can be solved efficiently by gradient-based algorithms from the Policy Optimization family. We present two such design examples: a toy mass-spring problem, and a real-world problem of designing an underactuated hand. We compare our method against traditional co-optimization approaches, and also demonstrate its effectiveness by building a physical prototype based on the learned hardware parameters. Videos and more details are available at https://roamlab.github.io/hwasp/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-chen21a, title = {Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning}, author = {Chen, Tianjian and He, Zhanpeng and Ciocarlie, Matei}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {1158--1173}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/chen21a/chen21a.pdf}, url = {https://proceedings.mlr.press/v155/chen21a.html}, abstract = {Deep Reinforcement Learning (RL) has shown great success in learning complex control policies for a variety of applications in robotics. However, in most such cases, the hardware of the robot has been considered immutable, modeled as part of the environment. In this study, we explore the problem of learning hardware and control parameters together in a unified RL framework. To achieve this, we propose to model the robot body as a “hardware policy”, analogous to and optimized jointly with its computational counterpart. We show that, by modeling such hardware policies as auto-differentiable computational graphs, the ensuing optimization problem can be solved efficiently by gradient-based algorithms from the Policy Optimization family. We present two such design examples: a toy mass-spring problem, and a real-world problem of designing an underactuated hand. We compare our method against traditional co-optimization approaches, and also demonstrate its effectiveness by building a physical prototype based on the learned hardware parameters. Videos and more details are available at https://roamlab.github.io/hwasp/.} }
Endnote
%0 Conference Paper %T Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning %A Tianjian Chen %A Zhanpeng He %A Matei Ciocarlie %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-chen21a %I PMLR %P 1158--1173 %U https://proceedings.mlr.press/v155/chen21a.html %V 155 %X Deep Reinforcement Learning (RL) has shown great success in learning complex control policies for a variety of applications in robotics. However, in most such cases, the hardware of the robot has been considered immutable, modeled as part of the environment. In this study, we explore the problem of learning hardware and control parameters together in a unified RL framework. To achieve this, we propose to model the robot body as a “hardware policy”, analogous to and optimized jointly with its computational counterpart. We show that, by modeling such hardware policies as auto-differentiable computational graphs, the ensuing optimization problem can be solved efficiently by gradient-based algorithms from the Policy Optimization family. We present two such design examples: a toy mass-spring problem, and a real-world problem of designing an underactuated hand. We compare our method against traditional co-optimization approaches, and also demonstrate its effectiveness by building a physical prototype based on the learned hardware parameters. Videos and more details are available at https://roamlab.github.io/hwasp/.
APA
Chen, T., He, Z. & Ciocarlie, M.. (2021). Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:1158-1173 Available from https://proceedings.mlr.press/v155/chen21a.html.

Related Material