Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search

Vu Nguyen, Tam Le, Makoto Yamada, Michael A. Osborne
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8084-8095, 2021.

Abstract

Neural architecture search (NAS) automates the design of deep neural networks. One of the main challenges in searching complex and non-continuous architectures is to compare the similarity of networks that the conventional Euclidean metric may fail to capture. Optimal transport (OT) is resilient to such complex structure by considering the minimal cost for transporting a network into another. However, the OT is generally not negative definite which may limit its ability to build the positive-definite kernels required in many kernel-dependent frameworks. Building upon tree-Wasserstein (TW), which is a negative definite variant of OT, we develop a novel discrepancy for neural architectures, and demonstrate it within a Gaussian process surrogate model for the sequential NAS settings. Furthermore, we derive a novel parallel NAS, using quality k-determinantal point process on the GP posterior, to select diverse and high-performing architectures from a discrete set of candidates. Empirically, we demonstrate that our TW-based approaches outperform other baselines in both sequential and parallel NAS.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-nguyen21d, title = {Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search}, author = {Nguyen, Vu and Le, Tam and Yamada, Makoto and Osborne, Michael A.}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8084--8095}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/nguyen21d/nguyen21d.pdf}, url = {https://proceedings.mlr.press/v139/nguyen21d.html}, abstract = {Neural architecture search (NAS) automates the design of deep neural networks. One of the main challenges in searching complex and non-continuous architectures is to compare the similarity of networks that the conventional Euclidean metric may fail to capture. Optimal transport (OT) is resilient to such complex structure by considering the minimal cost for transporting a network into another. However, the OT is generally not negative definite which may limit its ability to build the positive-definite kernels required in many kernel-dependent frameworks. Building upon tree-Wasserstein (TW), which is a negative definite variant of OT, we develop a novel discrepancy for neural architectures, and demonstrate it within a Gaussian process surrogate model for the sequential NAS settings. Furthermore, we derive a novel parallel NAS, using quality k-determinantal point process on the GP posterior, to select diverse and high-performing architectures from a discrete set of candidates. Empirically, we demonstrate that our TW-based approaches outperform other baselines in both sequential and parallel NAS.} }
Endnote
%0 Conference Paper %T Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search %A Vu Nguyen %A Tam Le %A Makoto Yamada %A Michael A. Osborne %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-nguyen21d %I PMLR %P 8084--8095 %U https://proceedings.mlr.press/v139/nguyen21d.html %V 139 %X Neural architecture search (NAS) automates the design of deep neural networks. One of the main challenges in searching complex and non-continuous architectures is to compare the similarity of networks that the conventional Euclidean metric may fail to capture. Optimal transport (OT) is resilient to such complex structure by considering the minimal cost for transporting a network into another. However, the OT is generally not negative definite which may limit its ability to build the positive-definite kernels required in many kernel-dependent frameworks. Building upon tree-Wasserstein (TW), which is a negative definite variant of OT, we develop a novel discrepancy for neural architectures, and demonstrate it within a Gaussian process surrogate model for the sequential NAS settings. Furthermore, we derive a novel parallel NAS, using quality k-determinantal point process on the GP posterior, to select diverse and high-performing architectures from a discrete set of candidates. Empirically, we demonstrate that our TW-based approaches outperform other baselines in both sequential and parallel NAS.
APA
Nguyen, V., Le, T., Yamada, M. & Osborne, M.A.. (2021). Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8084-8095 Available from https://proceedings.mlr.press/v139/nguyen21d.html.

Related Material