Exploring the loss landscape in neural architecture search
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:654-664, 2021.
Neural architecture search (NAS) has seen a steep rise in interest over the last few years. Many algorithms for NAS consist of searching through a space of architectures by iteratively choosing an architecture, evaluating its performance by training it, and using all prior evaluations to come up with the next choice. The evaluation step is noisy - the final accuracy varies based on the random initialization of the weights. Prior work has focused on devising new search algorithms to handle this noise, rather than quantifying or understanding the level of noise in architecture evaluations. In this work, we show that (1) the simplest hill-climbing algorithm is a powerful baseline for NAS, and (2), when the noise in popular NAS benchmark datasets is reduced to a minimum, the loss landscape becomes near-convex, causing hill-climbing to outperform many popular state-of-the-art algorithms. We further back up this observation by showing that the number of local minima is substantially reduced as the noise decreases and by giving a theoretical characterization of the performance of local search in NAS. Based on our findings, for NAS research we suggest (1) using local search as a baseline, and (2) denoising the training pipeline when possible.