Deep Neural Networks with MultiBranch Architectures Are Intrinsically Less NonConvex
[edit]
Proceedings of Machine Learning Research, PMLR 89:10991109, 2019.
Abstract
Several recently proposed architectures of neural networks such as ResNeXt, Inception, Xception, SqueezeNet and Wide ResNet are based on the designing idea of having multiple branches and have demonstrated improved performance in many applications. We show that one cause for such success is due to the fact that the multibranch architecture is less nonconvex in terms of duality gap. The duality gap measures the degree of intrinsic nonconvexity of an optimization problem: smaller gap in relative value implies lower degree of intrinsic nonconvexity. The challenge is to quantitatively measure the duality gap of highly nonconvex problems such as deep neural networks. In this work, we provide strong guarantees of this quantity for two classes of network architectures. For the neural networks with arbitrary activation functions, multibranch architecture and a variant of hinge loss, we show that the duality gap of both population and empirical risks shrinks to zero as the number of branches increases. This result sheds light on better understanding the power of overparametrization where increasing the number of branches tends to make the loss surface less nonconvex. For the neural networks with linear activation function and $\ell_2$ loss, we show that the duality gap of empirical risk is zero. Our two results work for arbitrary depths, while the analytical techniques might be of independent interest to nonconvex optimization more broadly. Experiments on both synthetic and realworld datasets validate our results.
Related Material


