A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs

Jingkai Mao; Jakob Foerster; Tim Rocktäschel; Maruan Al-Shedivat; Gregory Farquhar; Shimon Whiteson

A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs

Jingkai Mao, Jakob Foerster, Tim Rocktäschel, Maruan Al-Shedivat, Gregory Farquhar, Shimon Whiteson

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4343-4351, 2019.

Abstract

By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate. It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline. We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCE’s second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients. This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-mao19a,
  title = 	 {A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs},
  author =       {Mao, Jingkai and Foerster, Jakob and Rockt{\"a}schel, Tim and Al-Shedivat, Maruan and Farquhar, Gregory and Whiteson, Shimon},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {4343--4351},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/mao19a/mao19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/mao19a.html},
  abstract = 	 {By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate. It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline. We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCE’s second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients. This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning.}
}

Endnote

%0 Conference Paper
%T A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
%A Jingkai Mao
%A Jakob Foerster
%A Tim Rocktäschel
%A Maruan Al-Shedivat
%A Gregory Farquhar
%A Shimon Whiteson
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-mao19a
%I PMLR
%P 4343--4351
%U https://proceedings.mlr.press/v97/mao19a.html
%V 97
%X By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate. It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline. We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCE’s second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients. This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning.

APA

Mao, J., Foerster, J., Rocktäschel, T., Al-Shedivat, M., Farquhar, G. & Whiteson, S.. (2019). A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4343-4351 Available from https://proceedings.mlr.press/v97/mao19a.html.

A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs

Abstract

Cite this Paper

Related Material