Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

Hao Di; Haishan Ye; Yueling Zhang; Xiangyu Chang; Guang Dai; Ivor Tsang

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor Tsang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:10792-10810, 2024.

Abstract

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands

$\mathcal{O}(d)$ function evaluations (

$d$ is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (

$\texttt{ZPDVR}$ ) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods,

$\texttt{ZPDVR}$ relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation

$\mathcal{O}(1)$ times per iteration, and achieves the optimal

$\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO query complexity in the strongly convex and smooth setting, where

$\kappa$ represents the condition number and

$\epsilon$ is the desired accuracy. Empirical results validate

$\texttt{ZPDVR}$ ’s linear convergence and demonstrate its superior performance over other related methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-di24b,
  title = 	 {Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient},
  author =       {Di, Hao and Ye, Haishan and Zhang, Yueling and Chang, Xiangyu and Dai, Guang and Tsang, Ivor},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {10792--10810},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/di24b/di24b.pdf},
  url = 	 {https://proceedings.mlr.press/v235/di24b.html},
  abstract = 	 {Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands $\mathcal{O}(d)$ function evaluations ($d$ is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction ($\texttt{ZPDVR}$) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, $\texttt{ZPDVR}$ relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO query complexity in the strongly convex and smooth setting, where $\kappa$ represents the condition number and $\epsilon$ is the desired accuracy. Empirical results validate $\texttt{ZPDVR}$’s linear convergence and demonstrate its superior performance over other related methods.}
}

Endnote

%0 Conference Paper
%T Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient
%A Hao Di
%A Haishan Ye
%A Yueling Zhang
%A Xiangyu Chang
%A Guang Dai
%A Ivor Tsang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-di24b
%I PMLR
%P 10792--10810
%U https://proceedings.mlr.press/v235/di24b.html
%V 235
%X Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands $\mathcal{O}(d)$ function evaluations ($d$ is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction ($\texttt{ZPDVR}$) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, $\texttt{ZPDVR}$ relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO query complexity in the strongly convex and smooth setting, where $\kappa$ represents the condition number and $\epsilon$ is the desired accuracy. Empirical results validate $\texttt{ZPDVR}$’s linear convergence and demonstrate its superior performance over other related methods.

APA


Di, H., Ye, H., Zhang, Y., Chang, X., Dai, G. & Tsang, I.. (2024). Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:10792-10810 Available from https://proceedings.mlr.press/v235/di24b.html.

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

Abstract

Cite this Paper

Related Material