Variance Reduction for Evolution Strategies via Structured Control Variates

Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:646-656, 2020.

Abstract

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance. We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-tang20a, title = {Variance Reduction for Evolution Strategies via Structured Control Variates}, author = {Tang, Yunhao and Choromanski, Krzysztof and Kucukelbir, Alp}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {646--656}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/tang20a/tang20a.pdf}, url = {https://proceedings.mlr.press/v108/tang20a.html}, abstract = {Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance. We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.} }
Endnote
%0 Conference Paper %T Variance Reduction for Evolution Strategies via Structured Control Variates %A Yunhao Tang %A Krzysztof Choromanski %A Alp Kucukelbir %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-tang20a %I PMLR %P 646--656 %U https://proceedings.mlr.press/v108/tang20a.html %V 108 %X Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance. We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.
APA
Tang, Y., Choromanski, K. & Kucukelbir, A.. (2020). Variance Reduction for Evolution Strategies via Structured Control Variates. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:646-656 Available from https://proceedings.mlr.press/v108/tang20a.html.

Related Material