Modular Block-diagonal Curvature Approximations for Feedforward Architectures

Felix Dangel, Stefan Harmeling, Philipp Hennig
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:799-808, 2020.

Abstract

We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-dangel20a, title = { Modular Block-diagonal Curvature Approximations for Feedforward Architectures}, author = {Dangel, Felix and Harmeling, Stefan and Hennig, Philipp}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {799--808}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/dangel20a/dangel20a.pdf}, url = {https://proceedings.mlr.press/v108/dangel20a.html}, abstract = {We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.} }
Endnote
%0 Conference Paper %T Modular Block-diagonal Curvature Approximations for Feedforward Architectures %A Felix Dangel %A Stefan Harmeling %A Philipp Hennig %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-dangel20a %I PMLR %P 799--808 %U https://proceedings.mlr.press/v108/dangel20a.html %V 108 %X We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.
APA
Dangel, F., Harmeling, S. & Hennig, P.. (2020). Modular Block-diagonal Curvature Approximations for Feedforward Architectures. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:799-808 Available from https://proceedings.mlr.press/v108/dangel20a.html.

Related Material