Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables

Hsiang-Fu Yu; Cho-Jui Hsieh; Inderjit S. Dhillon

Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables

Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S. Dhillon

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2641-2649, 2019.

Abstract

The key to the recent success of coordinate descent (CD) in many applications is to maintain a set of auxiliary variables to facilitate efficient single variable updates. For example, the vector of residual/primal variables has to be maintained when CD is applied for Lasso/linear SVM, respectively. An implementation without maintenance is $O(n)$ times slower than the one with maintenance, where n is the number of variables. In serial implementations, maintaining auxiliary variables is only a computing trick without changing the behavior of coordinate descent. However, maintenance of auxiliary variables is non-trivial when there are multiple threads/workers which read/write the auxiliary variables concurrently. Thus, most existing theoretical analysis of parallel CD either assumes vanilla CD without auxiliary variables (which ends up being extremely slow in practice) or limits to a small class of problems. In this paper, we consider a rich family of objective functions where AUX-PCD can be applied. We also establish global linear convergence for AUX-PCD with atomic operations for a general family of functions and perform a complete backward error analysis of AUX-PCD with wild updates, where some updates are not just delayed but lost because of memory conflicts. Our results enable us to provide theoretical guarantees for many practical parallel coordinate descent implementations, which currently lack guarantees (such as the implementation of Shotgun by Bradley et al. 2011, which uses auxiliary variables)

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-yu19d,
  title = 	 {Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables},
  author =       {Yu, Hsiang-Fu and Hsieh, Cho-Jui and Dhillon, Inderjit S.},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2641--2649},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/yu19d/yu19d.pdf},
  url = 	 {https://proceedings.mlr.press/v89/yu19d.html},
  abstract = 	 {The key to the recent success of coordinate descent (CD) in many applications is to maintain a set of auxiliary variables to facilitate efficient single variable updates. For example, the vector of residual/primal variables has to be maintained when CD is applied for Lasso/linear SVM, respectively. An implementation without maintenance is $O(n)$ times slower than the one with maintenance, where n is the number of variables. In serial implementations, maintaining auxiliary variables is only a computing trick without changing the behavior of coordinate descent. However, maintenance of auxiliary variables is non-trivial when there are multiple threads/workers which read/write the auxiliary variables concurrently. Thus, most existing theoretical analysis of parallel CD either assumes vanilla CD without auxiliary variables (which ends up being extremely slow in practice) or limits to a small class of problems. In this paper, we consider a rich family of objective functions where AUX-PCD can be applied. We also establish global linear convergence for AUX-PCD with atomic operations for a general family of functions and perform a complete backward error analysis of AUX-PCD with wild updates, where some updates are not just delayed but lost because of memory conflicts. Our results enable us to provide theoretical guarantees for many practical parallel coordinate descent implementations, which currently lack guarantees (such as the implementation of Shotgun by Bradley et al. 2011, which uses auxiliary variables)}
}

Endnote

%0 Conference Paper
%T Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables
%A Hsiang-Fu Yu
%A Cho-Jui Hsieh
%A Inderjit S. Dhillon
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-yu19d
%I PMLR
%P 2641--2649
%U https://proceedings.mlr.press/v89/yu19d.html
%V 89
%X The key to the recent success of coordinate descent (CD) in many applications is to maintain a set of auxiliary variables to facilitate efficient single variable updates. For example, the vector of residual/primal variables has to be maintained when CD is applied for Lasso/linear SVM, respectively. An implementation without maintenance is $O(n)$ times slower than the one with maintenance, where n is the number of variables. In serial implementations, maintaining auxiliary variables is only a computing trick without changing the behavior of coordinate descent. However, maintenance of auxiliary variables is non-trivial when there are multiple threads/workers which read/write the auxiliary variables concurrently. Thus, most existing theoretical analysis of parallel CD either assumes vanilla CD without auxiliary variables (which ends up being extremely slow in practice) or limits to a small class of problems. In this paper, we consider a rich family of objective functions where AUX-PCD can be applied. We also establish global linear convergence for AUX-PCD with atomic operations for a general family of functions and perform a complete backward error analysis of AUX-PCD with wild updates, where some updates are not just delayed but lost because of memory conflicts. Our results enable us to provide theoretical guarantees for many practical parallel coordinate descent implementations, which currently lack guarantees (such as the implementation of Shotgun by Bradley et al. 2011, which uses auxiliary variables)

APA


Yu, H., Hsieh, C. & Dhillon, I.S.. (2019). Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2641-2649 Available from https://proceedings.mlr.press/v89/yu19d.html.

Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables

Abstract

Cite this Paper

Related Material