Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Paul Vicol

Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Paul Vicol

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:35084-35119, 2023.

Abstract

We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. Similarly to the recently-proposed Persistent Evolution Strategies (PES), ES-Single is unbiased, and overcomes chaos arising from recursive function applications by smoothing the meta-loss landscape. ES-Single samples a single perturbation per particle, that is kept fixed over the course of an inner problem (e.g., perturbations are not re-sampled for each partial unroll). Compared to PES, ES-Single is simpler to implement and has lower variance: the variance of ES-Single is constant with respect to the number of truncated unrolls, removing a key barrier in applying ES to long inner problems using short truncations. We show that ES-Single is unbiased for quadratic inner problems, and demonstrate empirically that its variance can be substantially lower than that of PES. ES-Single consistently outperforms PES on a variety of tasks, including a synthetic benchmark task, hyperparameter optimization, training recurrent neural networks, and training learned optimizers.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-vicol23a,
  title = 	 {Low-Variance Gradient Estimation in Unrolled Computation Graphs with {ES}-Single},
  author =       {Vicol, Paul},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {35084--35119},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/vicol23a/vicol23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/vicol23a.html},
  abstract = 	 {We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. Similarly to the recently-proposed Persistent Evolution Strategies (PES), ES-Single is unbiased, and overcomes chaos arising from recursive function applications by smoothing the meta-loss landscape. ES-Single samples a single perturbation per particle, that is kept fixed over the course of an inner problem (e.g., perturbations are not re-sampled for each partial unroll). Compared to PES, ES-Single is simpler to implement and has lower variance: the variance of ES-Single is constant with respect to the number of truncated unrolls, removing a key barrier in applying ES to long inner problems using short truncations. We show that ES-Single is unbiased for quadratic inner problems, and demonstrate empirically that its variance can be substantially lower than that of PES. ES-Single consistently outperforms PES on a variety of tasks, including a synthetic benchmark task, hyperparameter optimization, training recurrent neural networks, and training learned optimizers.}
}

Endnote

%0 Conference Paper
%T Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
%A Paul Vicol
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-vicol23a
%I PMLR
%P 35084--35119
%U https://proceedings.mlr.press/v202/vicol23a.html
%V 202
%X We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. Similarly to the recently-proposed Persistent Evolution Strategies (PES), ES-Single is unbiased, and overcomes chaos arising from recursive function applications by smoothing the meta-loss landscape. ES-Single samples a single perturbation per particle, that is kept fixed over the course of an inner problem (e.g., perturbations are not re-sampled for each partial unroll). Compared to PES, ES-Single is simpler to implement and has lower variance: the variance of ES-Single is constant with respect to the number of truncated unrolls, removing a key barrier in applying ES to long inner problems using short truncations. We show that ES-Single is unbiased for quadratic inner problems, and demonstrate empirically that its variance can be substantially lower than that of PES. ES-Single consistently outperforms PES on a variety of tasks, including a synthetic benchmark task, hyperparameter optimization, training recurrent neural networks, and training learned optimizers.

APA


Vicol, P.. (2023). Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:35084-35119 Available from https://proceedings.mlr.press/v202/vicol23a.html.

Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Abstract

Cite this Paper

Related Material