A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization

Yutong Dai; Guanyi Wang; Frank E. Curtis; Daniel P. Robinson

A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization

Yutong Dai, Guanyi Wang, Frank E. Curtis, Daniel P. Robinson

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5107-5133, 2023.

Abstract

This paper introduces a new proximal stochastic gradient method with variance reduction and stabilization for minimizing the sum of a convex stochastic function and a group sparsity-inducing regularization function. Since the method may be viewed as a stabilized version of the recently proposed algorithm PStorm, we call our algorithm S-PStorm. Our analysis shows that S-PStorm has strong convergence results. In particular, we prove an upper bound on the number of iterations required by S-PStorm before its iterates correctly identify (with high probability) an optimal support (i.e., the zero and nonzero structure of an optimal solution). Most algorithms in the literature with such a support identification property use variance reduction techniques that require either periodically evaluating an exact gradient or storing a history of stochastic gradients. Unlike these methods, S-PStorm achieves variance reduction without requiring either of these, which is advantageous. Moreover, our support-identification result for S-PStorm shows that, with high probability, an optimal support will be identified correctly in all iterations with index above a threshold. We believe that this type of result is new to the literature since the few existing other results prove that the optimal support is identified with high probability at each iteration with a sufficiently large index (meaning that the optimal support might be identified in some iterations, but not in others). Numerical experiments on regularized logistic loss problems show that S-PStorm outperforms existing methods in various metrics that measure how efficiently and robustly iterates of an algorithm identify an optimal support.

Cite this Paper

BibTeX

@InProceedings{pmlr-v206-dai23a,
  title = 	 {A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization},
  author =       {Dai, Yutong and Wang, Guanyi and Curtis, Frank E. and Robinson, Daniel P.},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {5107--5133},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/dai23a/dai23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/dai23a.html},
  abstract = 	 {This paper introduces a new proximal stochastic gradient method with variance reduction and stabilization for minimizing the sum of a convex stochastic function and a group sparsity-inducing regularization function. Since the method may be viewed as a stabilized version of the recently proposed algorithm PStorm, we call our algorithm S-PStorm. Our analysis shows that S-PStorm has strong convergence results. In particular, we prove an upper bound on the number of iterations required by S-PStorm before its iterates correctly identify (with high probability) an optimal support (i.e., the zero and nonzero structure of an optimal solution). Most algorithms in the literature with such a support identification property use variance reduction techniques that require either periodically evaluating an exact gradient or storing a history of stochastic gradients. Unlike these methods, S-PStorm achieves variance reduction without requiring either of these, which is advantageous. Moreover, our support-identification result for S-PStorm shows that, with high probability, an optimal support will be identified correctly in all iterations with index above a threshold. We believe that this type of result is new to the literature since the few existing other results prove that the optimal support is identified with high probability at each iteration with a sufficiently large index (meaning that the optimal support might be identified in some iterations, but not in others). Numerical experiments on regularized logistic loss problems show that S-PStorm outperforms existing methods in various metrics that measure how efficiently and robustly iterates of an algorithm identify an optimal support.}
}

Endnote

%0 Conference Paper
%T A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization
%A Yutong Dai
%A Guanyi Wang
%A Frank E. Curtis
%A Daniel P. Robinson
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-dai23a
%I PMLR
%P 5107--5133
%U https://proceedings.mlr.press/v206/dai23a.html
%V 206
%X This paper introduces a new proximal stochastic gradient method with variance reduction and stabilization for minimizing the sum of a convex stochastic function and a group sparsity-inducing regularization function. Since the method may be viewed as a stabilized version of the recently proposed algorithm PStorm, we call our algorithm S-PStorm. Our analysis shows that S-PStorm has strong convergence results. In particular, we prove an upper bound on the number of iterations required by S-PStorm before its iterates correctly identify (with high probability) an optimal support (i.e., the zero and nonzero structure of an optimal solution). Most algorithms in the literature with such a support identification property use variance reduction techniques that require either periodically evaluating an exact gradient or storing a history of stochastic gradients. Unlike these methods, S-PStorm achieves variance reduction without requiring either of these, which is advantageous. Moreover, our support-identification result for S-PStorm shows that, with high probability, an optimal support will be identified correctly in all iterations with index above a threshold. We believe that this type of result is new to the literature since the few existing other results prove that the optimal support is identified with high probability at each iteration with a sufficiently large index (meaning that the optimal support might be identified in some iterations, but not in others). Numerical experiments on regularized logistic loss problems show that S-PStorm outperforms existing methods in various metrics that measure how efficiently and robustly iterates of an algorithm identify an optimal support.

APA

Dai, Y., Wang, G., Curtis, F.E. & Robinson, D.P.. (2023). A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:5107-5133 Available from https://proceedings.mlr.press/v206/dai23a.html.

A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization

Abstract

Cite this Paper

Related Material