Frequency Bias in Neural Networks for Input of Non-Uniform Density

Ronen Basri, Meirav Galun, Amnon Geifman, David Jacobs, Yoni Kasten, Shira Kritchman
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:685-694, 2020.

Abstract

Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias – networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn from a uniform distribution, we here use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results, which combine analytic and empirical observations, show that when learning a pure harmonic function of frequency $\kappa$, convergence at a point $x \in \S^{d-1}$ occurs in time $O(\kappa^d/p(x))$ where $p(x)$ denotes the local density at $x$. Specifically, for data in $\S^1$ we analytically derive the eigenfunctions of the kernel associated with the NTK for two-layer networks. We further prove convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK. Our empirical study highlights similarities and differences between deep and shallow networks in this model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-basri20a, title = {Frequency Bias in Neural Networks for Input of Non-Uniform Density}, author = {Basri, Ronen and Galun, Meirav and Geifman, Amnon and Jacobs, David and Kasten, Yoni and Kritchman, Shira}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {685--694}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/basri20a/basri20a.pdf}, url = {https://proceedings.mlr.press/v119/basri20a.html}, abstract = {Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias – networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn from a uniform distribution, we here use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results, which combine analytic and empirical observations, show that when learning a pure harmonic function of frequency $\kappa$, convergence at a point $x \in \S^{d-1}$ occurs in time $O(\kappa^d/p(x))$ where $p(x)$ denotes the local density at $x$. Specifically, for data in $\S^1$ we analytically derive the eigenfunctions of the kernel associated with the NTK for two-layer networks. We further prove convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK. Our empirical study highlights similarities and differences between deep and shallow networks in this model.} }
Endnote
%0 Conference Paper %T Frequency Bias in Neural Networks for Input of Non-Uniform Density %A Ronen Basri %A Meirav Galun %A Amnon Geifman %A David Jacobs %A Yoni Kasten %A Shira Kritchman %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-basri20a %I PMLR %P 685--694 %U https://proceedings.mlr.press/v119/basri20a.html %V 119 %X Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias – networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn from a uniform distribution, we here use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results, which combine analytic and empirical observations, show that when learning a pure harmonic function of frequency $\kappa$, convergence at a point $x \in \S^{d-1}$ occurs in time $O(\kappa^d/p(x))$ where $p(x)$ denotes the local density at $x$. Specifically, for data in $\S^1$ we analytically derive the eigenfunctions of the kernel associated with the NTK for two-layer networks. We further prove convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK. Our empirical study highlights similarities and differences between deep and shallow networks in this model.
APA
Basri, R., Galun, M., Geifman, A., Jacobs, D., Kasten, Y. & Kritchman, S.. (2020). Frequency Bias in Neural Networks for Input of Non-Uniform Density. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:685-694 Available from https://proceedings.mlr.press/v119/basri20a.html.

Related Material