Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays

Sandeep Singh Sandha, Luis Garcia, Bharathan Balaji, Fatima Anwar, Mani Srivastava
Proceedings of the 2020 Conference on Robot Learning, PMLR 155:1066-1083, 2021.

Abstract

Deep Reinforcement Learning (RL) has demonstrated to be useful for a wide variety of robotics applications. To address sample efficiency and safety during training, it is common to train Deep RL policies in a simulator and then deploy to the real world, a process called Sim2Real transfer. For robotics applications, the deployment heterogeneities and runtime compute stochasticity results in variable timing characteristics of sensor sampling rates and end-to-end delays from sensing to actuation. Prior works have used the technique of domain randomization to enable the successful transfer of policies across domains having different state transition delays. We show that variation in sampling rates and policy execution time leads to degradation in Deep RL policy performance, and that domain randomization is insufficient to overcome this limitation. We propose the Time-in-State RL (TSRL) approach, which includes delays and sampling rate as additional agent observations at training time to improve the robustness of Deep RL policies. We demonstrate the efficacy of TSRL on HalfCheetah, Ant, and car robot in simulation and on a real robot using a 1/18th scale car.

Cite this Paper


BibTeX
@InProceedings{pmlr-v155-sandha21a, title = {Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays}, author = {Sandha, Sandeep Singh and Garcia, Luis and Balaji, Bharathan and Anwar, Fatima and Srivastava, Mani}, booktitle = {Proceedings of the 2020 Conference on Robot Learning}, pages = {1066--1083}, year = {2021}, editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, volume = {155}, series = {Proceedings of Machine Learning Research}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v155/sandha21a/sandha21a.pdf}, url = {https://proceedings.mlr.press/v155/sandha21a.html}, abstract = {Deep Reinforcement Learning (RL) has demonstrated to be useful for a wide variety of robotics applications. To address sample efficiency and safety during training, it is common to train Deep RL policies in a simulator and then deploy to the real world, a process called Sim2Real transfer. For robotics applications, the deployment heterogeneities and runtime compute stochasticity results in variable timing characteristics of sensor sampling rates and end-to-end delays from sensing to actuation. Prior works have used the technique of domain randomization to enable the successful transfer of policies across domains having different state transition delays. We show that variation in sampling rates and policy execution time leads to degradation in Deep RL policy performance, and that domain randomization is insufficient to overcome this limitation. We propose the Time-in-State RL (TSRL) approach, which includes delays and sampling rate as additional agent observations at training time to improve the robustness of Deep RL policies. We demonstrate the efficacy of TSRL on HalfCheetah, Ant, and car robot in simulation and on a real robot using a 1/18th scale car.} }
Endnote
%0 Conference Paper %T Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays %A Sandeep Singh Sandha %A Luis Garcia %A Bharathan Balaji %A Fatima Anwar %A Mani Srivastava %B Proceedings of the 2020 Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2021 %E Jens Kober %E Fabio Ramos %E Claire Tomlin %F pmlr-v155-sandha21a %I PMLR %P 1066--1083 %U https://proceedings.mlr.press/v155/sandha21a.html %V 155 %X Deep Reinforcement Learning (RL) has demonstrated to be useful for a wide variety of robotics applications. To address sample efficiency and safety during training, it is common to train Deep RL policies in a simulator and then deploy to the real world, a process called Sim2Real transfer. For robotics applications, the deployment heterogeneities and runtime compute stochasticity results in variable timing characteristics of sensor sampling rates and end-to-end delays from sensing to actuation. Prior works have used the technique of domain randomization to enable the successful transfer of policies across domains having different state transition delays. We show that variation in sampling rates and policy execution time leads to degradation in Deep RL policy performance, and that domain randomization is insufficient to overcome this limitation. We propose the Time-in-State RL (TSRL) approach, which includes delays and sampling rate as additional agent observations at training time to improve the robustness of Deep RL policies. We demonstrate the efficacy of TSRL on HalfCheetah, Ant, and car robot in simulation and on a real robot using a 1/18th scale car.
APA
Sandha, S.S., Garcia, L., Balaji, B., Anwar, F. & Srivastava, M.. (2021). Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays. Proceedings of the 2020 Conference on Robot Learning, in Proceedings of Machine Learning Research 155:1066-1083 Available from https://proceedings.mlr.press/v155/sandha21a.html.

Related Material