Is Kernel Prediction More Powerful than Gating in Convolutional Neural Networks?

Lorenz K Muller
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:36591-36604, 2024.

Abstract

Neural networks whose weights are the output of a predictor (HyperNetworks) achieve excellent performance on many tasks. In ConvNets, kernel prediction layers are a popular type of HyperNetwork. Previous theoretical work has argued that a hierarchy of multiplicative interactions exists in which gating is at the bottom and full weight prediction, as in HyperNetworks, is at the top. In this paper, we constructively demonstrate an equivalence between gating combined with fixed weight layers and weight prediction, relativizing the notion of a hierarchy of multiplicative interactions. We further derive an equivalence between a restricted type of HyperNetwork and factorization machines. Finally, we find empirically that gating layers can learn to imitate weight prediction layers with an SGD variant and show a novel practical application in image denoising using kernel prediction networks. Our reformulation of predicted kernels, combining fixed layers and gating, reduces memory requirements.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-muller24a, title = {Is Kernel Prediction More Powerful than Gating in Convolutional Neural Networks?}, author = {Muller, Lorenz K}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {36591--36604}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/muller24a/muller24a.pdf}, url = {https://proceedings.mlr.press/v235/muller24a.html}, abstract = {Neural networks whose weights are the output of a predictor (HyperNetworks) achieve excellent performance on many tasks. In ConvNets, kernel prediction layers are a popular type of HyperNetwork. Previous theoretical work has argued that a hierarchy of multiplicative interactions exists in which gating is at the bottom and full weight prediction, as in HyperNetworks, is at the top. In this paper, we constructively demonstrate an equivalence between gating combined with fixed weight layers and weight prediction, relativizing the notion of a hierarchy of multiplicative interactions. We further derive an equivalence between a restricted type of HyperNetwork and factorization machines. Finally, we find empirically that gating layers can learn to imitate weight prediction layers with an SGD variant and show a novel practical application in image denoising using kernel prediction networks. Our reformulation of predicted kernels, combining fixed layers and gating, reduces memory requirements.} }
Endnote
%0 Conference Paper %T Is Kernel Prediction More Powerful than Gating in Convolutional Neural Networks? %A Lorenz K Muller %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-muller24a %I PMLR %P 36591--36604 %U https://proceedings.mlr.press/v235/muller24a.html %V 235 %X Neural networks whose weights are the output of a predictor (HyperNetworks) achieve excellent performance on many tasks. In ConvNets, kernel prediction layers are a popular type of HyperNetwork. Previous theoretical work has argued that a hierarchy of multiplicative interactions exists in which gating is at the bottom and full weight prediction, as in HyperNetworks, is at the top. In this paper, we constructively demonstrate an equivalence between gating combined with fixed weight layers and weight prediction, relativizing the notion of a hierarchy of multiplicative interactions. We further derive an equivalence between a restricted type of HyperNetwork and factorization machines. Finally, we find empirically that gating layers can learn to imitate weight prediction layers with an SGD variant and show a novel practical application in image denoising using kernel prediction networks. Our reformulation of predicted kernels, combining fixed layers and gating, reduces memory requirements.
APA
Muller, L.K.. (2024). Is Kernel Prediction More Powerful than Gating in Convolutional Neural Networks?. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:36591-36604 Available from https://proceedings.mlr.press/v235/muller24a.html.

Related Material