An Adaptive Tangent Feature Perspective of Neural Networks

Daniel LeJeune, Sina Alemohammad
Conference on Parsimony and Learning, PMLR 234:379-394, 2024.

Abstract

In order to better understand feature learning in neural networks, we propose and study linear models in tangent feature space where the features are allowed to be transformed during training. We consider linear feature transformations, resulting in a joint optimization over parameters and transformations with a bilinear interpolation constraint. We show that this relaxed optimization problem has an equivalent linearly constrained optimization with structured regularization that encourages approximately low rank solutions. Specializing to structures arising in neural networks, we gain insights into how the features and thus the kernel function change, providing additional nuance to the phenomenon of kernel alignment when the target function is poorly represented by tangent features. In addition to verifying our theoretical observations in real neural networks on a simple regression problem, we empirically show that an adaptive feature implementation of tangent feature classification has an order of magnitude lower sample complexity than the fixed tangent feature model on MNIST and CIFAR-10.

Cite this Paper


BibTeX
@InProceedings{pmlr-v234-lejeune24a, title = {An Adaptive Tangent Feature Perspective of Neural Networks}, author = {LeJeune, Daniel and Alemohammad, Sina}, booktitle = {Conference on Parsimony and Learning}, pages = {379--394}, year = {2024}, editor = {Chi, Yuejie and Dziugaite, Gintare Karolina and Qu, Qing and Wang, Atlas Wang and Zhu, Zhihui}, volume = {234}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v234/lejeune24a/lejeune24a.pdf}, url = {https://proceedings.mlr.press/v234/lejeune24a.html}, abstract = {In order to better understand feature learning in neural networks, we propose and study linear models in tangent feature space where the features are allowed to be transformed during training. We consider linear feature transformations, resulting in a joint optimization over parameters and transformations with a bilinear interpolation constraint. We show that this relaxed optimization problem has an equivalent linearly constrained optimization with structured regularization that encourages approximately low rank solutions. Specializing to structures arising in neural networks, we gain insights into how the features and thus the kernel function change, providing additional nuance to the phenomenon of kernel alignment when the target function is poorly represented by tangent features. In addition to verifying our theoretical observations in real neural networks on a simple regression problem, we empirically show that an adaptive feature implementation of tangent feature classification has an order of magnitude lower sample complexity than the fixed tangent feature model on MNIST and CIFAR-10.} }
Endnote
%0 Conference Paper %T An Adaptive Tangent Feature Perspective of Neural Networks %A Daniel LeJeune %A Sina Alemohammad %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2024 %E Yuejie Chi %E Gintare Karolina Dziugaite %E Qing Qu %E Atlas Wang Wang %E Zhihui Zhu %F pmlr-v234-lejeune24a %I PMLR %P 379--394 %U https://proceedings.mlr.press/v234/lejeune24a.html %V 234 %X In order to better understand feature learning in neural networks, we propose and study linear models in tangent feature space where the features are allowed to be transformed during training. We consider linear feature transformations, resulting in a joint optimization over parameters and transformations with a bilinear interpolation constraint. We show that this relaxed optimization problem has an equivalent linearly constrained optimization with structured regularization that encourages approximately low rank solutions. Specializing to structures arising in neural networks, we gain insights into how the features and thus the kernel function change, providing additional nuance to the phenomenon of kernel alignment when the target function is poorly represented by tangent features. In addition to verifying our theoretical observations in real neural networks on a simple regression problem, we empirically show that an adaptive feature implementation of tangent feature classification has an order of magnitude lower sample complexity than the fixed tangent feature model on MNIST and CIFAR-10.
APA
LeJeune, D. & Alemohammad, S.. (2024). An Adaptive Tangent Feature Perspective of Neural Networks. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 234:379-394 Available from https://proceedings.mlr.press/v234/lejeune24a.html.

Related Material