Modular Block-diagonal Curvature Approximations for Feedforward Architectures

Felix Dangel, Stefan Harmeling, Philipp Hennig
; Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:799-808, 2020.

Abstract

We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-dangel20a, title = { Modular Block-diagonal Curvature Approximations for Feedforward Architectures}, author = {Dangel, Felix and Harmeling, Stefan and Hennig, Philipp}, pages = {799--808}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, address = {Online}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/dangel20a/dangel20a.pdf}, url = {http://proceedings.mlr.press/v108/dangel20a.html}, abstract = {We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.} }
Endnote
%0 Conference Paper %T Modular Block-diagonal Curvature Approximations for Feedforward Architectures %A Felix Dangel %A Stefan Harmeling %A Philipp Hennig %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-dangel20a %I PMLR %J Proceedings of Machine Learning Research %P 799--808 %U http://proceedings.mlr.press %V 108 %W PMLR %X We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.
APA
Dangel, F., Harmeling, S. & Hennig, P.. (2020). Modular Block-diagonal Curvature Approximations for Feedforward Architectures. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in PMLR 108:799-808

Related Material