Generalization Guarantees for Neural Architecture Search with Train-Validation Split

Samet Oymak; Mingchen Li; Mahdi Soltanolkotabi

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8291-8301, 2021.

Abstract

Neural Architecture Search (NAS) is a popular method for automatically designing optimized deep-learning architectures. NAS methods commonly use bilevel optimization where one optimizes the weights over the training data (lower-level problem) and hyperparameters - such as the architecture - over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori, it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of train-validation split. To this aim, we first show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss and help prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous search spaces which are relevant for differentiable search schemes. We then establish generalization bounds for NAS problems with an emphasis on an activation search problem and gradient-based methods. Finally, we show rigorous connections between NAS and low-rank matrix learning which leads to algorithmic insights where the solution of the upper problem can be accurately learned via spectral methods to achieve near-minimal risk.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-oymak21a,
  title = 	 {Generalization Guarantees for Neural Architecture Search with Train-Validation Split},
  author =       {Oymak, Samet and Li, Mingchen and Soltanolkotabi, Mahdi},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8291--8301},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/oymak21a/oymak21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/oymak21a.html},
  abstract = 	 {Neural Architecture Search (NAS) is a popular method for automatically designing optimized deep-learning architectures. NAS methods commonly use bilevel optimization where one optimizes the weights over the training data (lower-level problem) and hyperparameters - such as the architecture - over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori, it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of train-validation split. To this aim, we first show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss and help prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous search spaces which are relevant for differentiable search schemes. We then establish generalization bounds for NAS problems with an emphasis on an activation search problem and gradient-based methods. Finally, we show rigorous connections between NAS and low-rank matrix learning which leads to algorithmic insights where the solution of the upper problem can be accurately learned via spectral methods to achieve near-minimal risk.}
}

Endnote

%0 Conference Paper
%T Generalization Guarantees for Neural Architecture Search with Train-Validation Split
%A Samet Oymak
%A Mingchen Li
%A Mahdi Soltanolkotabi
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-oymak21a
%I PMLR
%P 8291--8301
%U https://proceedings.mlr.press/v139/oymak21a.html
%V 139
%X Neural Architecture Search (NAS) is a popular method for automatically designing optimized deep-learning architectures. NAS methods commonly use bilevel optimization where one optimizes the weights over the training data (lower-level problem) and hyperparameters - such as the architecture - over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori, it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of train-validation split. To this aim, we first show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss and help prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous search spaces which are relevant for differentiable search schemes. We then establish generalization bounds for NAS problems with an emphasis on an activation search problem and gradient-based methods. Finally, we show rigorous connections between NAS and low-rank matrix learning which leads to algorithmic insights where the solution of the upper problem can be accurately learned via spectral methods to achieve near-minimal risk.

APA


Oymak, S., Li, M. & Soltanolkotabi, M.. (2021). Generalization Guarantees for Neural Architecture Search with Train-Validation Split. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8291-8301 Available from https://proceedings.mlr.press/v139/oymak21a.html.

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

Abstract

Cite this Paper

Related Material