“No Free Lunch” in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization

Wuyang Chen, Wei Huang, Zhangyang Wang
Proceedings of the Second International Conference on Automated Machine Learning, PMLR 224:14/1-29, 2023.

Abstract

The prosperity of deep learning and automated machine learning (AutoML) is largely rooted in the development of novel neural networks – but what defines and controls the “goodness” of networks in an architecture space? Test accuracy, a golden standard in AutoML, is closely related to three aspects: (1) expressivity (how complicated functions a network can approximate over the training data); (2) convergence (how fast the network can reach low training error under gradient descent); (3) generalization (whether a trained network can be generalized from the training data to unseen samples with low test error). However, most previous theory papers focus on fixed model structures, largely ignoring sophisticated networks used in practice. To facilitate the interpretation and understanding of the architecture design by AutoML, we target connecting a bigger picture: how does the architecture jointly impact its expressivity, convergence, and generalization? We demonstrate the “no free lunch” behavior in networks from an architecture space: given a fixed budget on the number of parameters, there does not exist a single architecture that is optimal in all three aspects. In other words, separately optimizing expressivity, convergence, and generalization will achieve different networks in the architecture space. Our analysis can explain a wide range of observations in AutoML. Experiments on popular benchmarks confirm our theoretical analysis. Our codes are attached in the supplement.

Cite this Paper


BibTeX
@InProceedings{pmlr-v224-chen23a, title = {“No Free Lunch” in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization}, author = {Chen, Wuyang and Huang, Wei and Wang, Zhangyang}, booktitle = {Proceedings of the Second International Conference on Automated Machine Learning}, pages = {14/1--29}, year = {2023}, editor = {Faust, Aleksandra and Garnett, Roman and White, Colin and Hutter, Frank and Gardner, Jacob R.}, volume = {224}, series = {Proceedings of Machine Learning Research}, month = {12--15 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v224/chen23a/chen23a.pdf}, url = {https://proceedings.mlr.press/v224/chen23a.html}, abstract = {The prosperity of deep learning and automated machine learning (AutoML) is largely rooted in the development of novel neural networks – but what defines and controls the “goodness” of networks in an architecture space? Test accuracy, a golden standard in AutoML, is closely related to three aspects: (1) expressivity (how complicated functions a network can approximate over the training data); (2) convergence (how fast the network can reach low training error under gradient descent); (3) generalization (whether a trained network can be generalized from the training data to unseen samples with low test error). However, most previous theory papers focus on fixed model structures, largely ignoring sophisticated networks used in practice. To facilitate the interpretation and understanding of the architecture design by AutoML, we target connecting a bigger picture: how does the architecture jointly impact its expressivity, convergence, and generalization? We demonstrate the “no free lunch” behavior in networks from an architecture space: given a fixed budget on the number of parameters, there does not exist a single architecture that is optimal in all three aspects. In other words, separately optimizing expressivity, convergence, and generalization will achieve different networks in the architecture space. Our analysis can explain a wide range of observations in AutoML. Experiments on popular benchmarks confirm our theoretical analysis. Our codes are attached in the supplement.} }
Endnote
%0 Conference Paper %T “No Free Lunch” in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization %A Wuyang Chen %A Wei Huang %A Zhangyang Wang %B Proceedings of the Second International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Aleksandra Faust %E Roman Garnett %E Colin White %E Frank Hutter %E Jacob R. Gardner %F pmlr-v224-chen23a %I PMLR %P 14/1--29 %U https://proceedings.mlr.press/v224/chen23a.html %V 224 %X The prosperity of deep learning and automated machine learning (AutoML) is largely rooted in the development of novel neural networks – but what defines and controls the “goodness” of networks in an architecture space? Test accuracy, a golden standard in AutoML, is closely related to three aspects: (1) expressivity (how complicated functions a network can approximate over the training data); (2) convergence (how fast the network can reach low training error under gradient descent); (3) generalization (whether a trained network can be generalized from the training data to unseen samples with low test error). However, most previous theory papers focus on fixed model structures, largely ignoring sophisticated networks used in practice. To facilitate the interpretation and understanding of the architecture design by AutoML, we target connecting a bigger picture: how does the architecture jointly impact its expressivity, convergence, and generalization? We demonstrate the “no free lunch” behavior in networks from an architecture space: given a fixed budget on the number of parameters, there does not exist a single architecture that is optimal in all three aspects. In other words, separately optimizing expressivity, convergence, and generalization will achieve different networks in the architecture space. Our analysis can explain a wide range of observations in AutoML. Experiments on popular benchmarks confirm our theoretical analysis. Our codes are attached in the supplement.
APA
Chen, W., Huang, W. & Wang, Z.. (2023). “No Free Lunch” in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization. Proceedings of the Second International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 224:14/1-29 Available from https://proceedings.mlr.press/v224/chen23a.html.

Related Material