Neural Kernels Without Tangents

Vaishaal Shankar, Alex Fang, Wenshuo Guo, Sara Fridovich-Keil, Jonathan Ragan-Kelley, Ludwig Schmidt, Benjamin Recht
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8614-8623, 2020.

Abstract

We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating “compositional” kernels from bags of features. We show that these operations correspond to many of the building blocks of “neural tangent kernels (NTK)”. Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-shankar20a, title = {Neural Kernels Without Tangents}, author = {Shankar, Vaishaal and Fang, Alex and Guo, Wenshuo and Fridovich-Keil, Sara and Ragan-Kelley, Jonathan and Schmidt, Ludwig and Recht, Benjamin}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8614--8623}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/shankar20a/shankar20a.pdf}, url = { http://proceedings.mlr.press/v119/shankar20a.html }, abstract = {We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating “compositional” kernels from bags of features. We show that these operations correspond to many of the building blocks of “neural tangent kernels (NTK)”. Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.} }
Endnote
%0 Conference Paper %T Neural Kernels Without Tangents %A Vaishaal Shankar %A Alex Fang %A Wenshuo Guo %A Sara Fridovich-Keil %A Jonathan Ragan-Kelley %A Ludwig Schmidt %A Benjamin Recht %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-shankar20a %I PMLR %P 8614--8623 %U http://proceedings.mlr.press/v119/shankar20a.html %V 119 %X We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating “compositional” kernels from bags of features. We show that these operations correspond to many of the building blocks of “neural tangent kernels (NTK)”. Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.
APA
Shankar, V., Fang, A., Guo, W., Fridovich-Keil, S., Ragan-Kelley, J., Schmidt, L. & Recht, B.. (2020). Neural Kernels Without Tangents. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8614-8623 Available from http://proceedings.mlr.press/v119/shankar20a.html .

Related Material