Bregman Neural Networks

Jordan Frecon, Gilles Gasso, Massimiliano Pontil, Saverio Salzo
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6779-6792, 2022.

Abstract

We present a framework based on bilevel optimization for learning multilayer, deep data representations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer as well as a fidelity term and some linear function both depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the linear functions to yield a linearly separable final representation. We show that, by choosing the fidelity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feed-forward neural network. Instead, by elaborating on Bregman distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experiments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-frecon22a, title = {{B}regman Neural Networks}, author = {Frecon, Jordan and Gasso, Gilles and Pontil, Massimiliano and Salzo, Saverio}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6779--6792}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/frecon22a/frecon22a.pdf}, url = {https://proceedings.mlr.press/v162/frecon22a.html}, abstract = {We present a framework based on bilevel optimization for learning multilayer, deep data representations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer as well as a fidelity term and some linear function both depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the linear functions to yield a linearly separable final representation. We show that, by choosing the fidelity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feed-forward neural network. Instead, by elaborating on Bregman distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experiments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.} }
Endnote
%0 Conference Paper %T Bregman Neural Networks %A Jordan Frecon %A Gilles Gasso %A Massimiliano Pontil %A Saverio Salzo %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-frecon22a %I PMLR %P 6779--6792 %U https://proceedings.mlr.press/v162/frecon22a.html %V 162 %X We present a framework based on bilevel optimization for learning multilayer, deep data representations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer as well as a fidelity term and some linear function both depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the linear functions to yield a linearly separable final representation. We show that, by choosing the fidelity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feed-forward neural network. Instead, by elaborating on Bregman distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experiments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.
APA
Frecon, J., Gasso, G., Pontil, M. & Salzo, S.. (2022). Bregman Neural Networks. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6779-6792 Available from https://proceedings.mlr.press/v162/frecon22a.html.

Related Material