Do Differentiable Simulators Give Better Policy Gradients?

Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20668-20696, 2022.

Abstract

Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-suh22b, title = {Do Differentiable Simulators Give Better Policy Gradients?}, author = {Suh, Hyung Ju and Simchowitz, Max and Zhang, Kaiqing and Tedrake, Russ}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {20668--20696}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/suh22b/suh22b.pdf}, url = {https://proceedings.mlr.press/v162/suh22b.html}, abstract = {Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.} }
Endnote
%0 Conference Paper %T Do Differentiable Simulators Give Better Policy Gradients? %A Hyung Ju Suh %A Max Simchowitz %A Kaiqing Zhang %A Russ Tedrake %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-suh22b %I PMLR %P 20668--20696 %U https://proceedings.mlr.press/v162/suh22b.html %V 162 %X Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.
APA
Suh, H.J., Simchowitz, M., Zhang, K. & Tedrake, R.. (2022). Do Differentiable Simulators Give Better Policy Gradients?. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20668-20696 Available from https://proceedings.mlr.press/v162/suh22b.html.

Related Material