Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2641-2649, 2019.
The key to the recent success of coordinate descent (CD) in many applications is to maintain a set of auxiliary variables to facilitate efficient single variable updates. For example, the vector of residual/primal variables has to be maintained when CD is applied for Lasso/linear SVM, respectively. An implementation without maintenance is $O(n)$ times slower than the one with maintenance, where n is the number of variables. In serial implementations, maintaining auxiliary variables is only a computing trick without changing the behavior of coordinate descent. However, maintenance of auxiliary variables is non-trivial when there are multiple threads/workers which read/write the auxiliary variables concurrently. Thus, most existing theoretical analysis of parallel CD either assumes vanilla CD without auxiliary variables (which ends up being extremely slow in practice) or limits to a small class of problems. In this paper, we consider a rich family of objective functions where AUX-PCD can be applied. We also establish global linear convergence for AUX-PCD with atomic operations for a general family of functions and perform a complete backward error analysis of AUX-PCD with wild updates, where some updates are not just delayed but lost because of memory conflicts. Our results enable us to provide theoretical guarantees for many practical parallel coordinate descent implementations, which currently lack guarantees (such as the implementation of Shotgun by Bradley et al. 2011, which uses auxiliary variables)