StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Tyler B. Johnson, Carlos Guestrin
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1752-1760, 2017.

Abstract

Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems in machine learning. Despite this fact, CD can also be very computationally wasteful. Due to sparsity in sparse regression problems, for example, the majority of CD updates often result in no progress toward the solution. To address this inefficiency, we propose a modified CD algorithm named “StingyCD.” By skipping over many updates that are guaranteed to not decrease the objective value, StingyCD significantly reduces convergence times. Since StingyCD only skips updates with this guarantee, however, StingyCD does not fully exploit the problem’s sparsity. For this reason, we also propose StingyCD+, an algorithm that achieves further speed-ups by skipping updates more aggressively. Since StingyCD and StingyCD+ rely on simple modifications to CD, it is also straightforward to use these algorithms with other approaches to scaling optimization. In empirical comparisons, StingyCD and StingyCD+ improve convergence times considerably for several L1-regularized optimization problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-johnson17a, title = {{S}tingy{CD}: Safely Avoiding Wasteful Updates in Coordinate Descent}, author = {Tyler B. Johnson and Carlos Guestrin}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {1752--1760}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/johnson17a/johnson17a.pdf}, url = {https://proceedings.mlr.press/v70/johnson17a.html}, abstract = {Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems in machine learning. Despite this fact, CD can also be very computationally wasteful. Due to sparsity in sparse regression problems, for example, the majority of CD updates often result in no progress toward the solution. To address this inefficiency, we propose a modified CD algorithm named “StingyCD.” By skipping over many updates that are guaranteed to not decrease the objective value, StingyCD significantly reduces convergence times. Since StingyCD only skips updates with this guarantee, however, StingyCD does not fully exploit the problem’s sparsity. For this reason, we also propose StingyCD+, an algorithm that achieves further speed-ups by skipping updates more aggressively. Since StingyCD and StingyCD+ rely on simple modifications to CD, it is also straightforward to use these algorithms with other approaches to scaling optimization. In empirical comparisons, StingyCD and StingyCD+ improve convergence times considerably for several L1-regularized optimization problems.} }
Endnote
%0 Conference Paper %T StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent %A Tyler B. Johnson %A Carlos Guestrin %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-johnson17a %I PMLR %P 1752--1760 %U https://proceedings.mlr.press/v70/johnson17a.html %V 70 %X Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems in machine learning. Despite this fact, CD can also be very computationally wasteful. Due to sparsity in sparse regression problems, for example, the majority of CD updates often result in no progress toward the solution. To address this inefficiency, we propose a modified CD algorithm named “StingyCD.” By skipping over many updates that are guaranteed to not decrease the objective value, StingyCD significantly reduces convergence times. Since StingyCD only skips updates with this guarantee, however, StingyCD does not fully exploit the problem’s sparsity. For this reason, we also propose StingyCD+, an algorithm that achieves further speed-ups by skipping updates more aggressively. Since StingyCD and StingyCD+ rely on simple modifications to CD, it is also straightforward to use these algorithms with other approaches to scaling optimization. In empirical comparisons, StingyCD and StingyCD+ improve convergence times considerably for several L1-regularized optimization problems.
APA
Johnson, T.B. & Guestrin, C.. (2017). StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1752-1760 Available from https://proceedings.mlr.press/v70/johnson17a.html.

Related Material