Distributed optimization of deeply nested systems
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:10-19, 2014.
Intelligent processing of complex signals such as images is often performed by a hierarchy of nonlinear processing layers, such as a deep net or an object recognition cascade. Joint estimation of the parameters of all the layers is a difficult nonconvex optimization. We describe a general strategy to learn the parameters and, to some extent, the architecture of nested systems, which we call the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penalty-based methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, can perform some model selection on the fly, and is competitive with state-of-the-art nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.