Can Forward Gradient Match Backpropagation?

Louis Fournier, Stephane Rivaud, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10249-10264, 2023.

Abstract

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-fournier23a, title = {Can Forward Gradient Match Backpropagation?}, author = {Fournier, Louis and Rivaud, Stephane and Belilovsky, Eugene and Eickenberg, Michael and Oyallon, Edouard}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {10249--10264}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/fournier23a/fournier23a.pdf}, url = {https://proceedings.mlr.press/v202/fournier23a.html}, abstract = {Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.} }
Endnote
%0 Conference Paper %T Can Forward Gradient Match Backpropagation? %A Louis Fournier %A Stephane Rivaud %A Eugene Belilovsky %A Michael Eickenberg %A Edouard Oyallon %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-fournier23a %I PMLR %P 10249--10264 %U https://proceedings.mlr.press/v202/fournier23a.html %V 202 %X Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.
APA
Fournier, L., Rivaud, S., Belilovsky, E., Eickenberg, M. & Oyallon, E.. (2023). Can Forward Gradient Match Backpropagation?. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10249-10264 Available from https://proceedings.mlr.press/v202/fournier23a.html.

Related Material