Zeroth-order Optimization with Weak Dimension Dependency

Pengyun Yue; Long Yang; Cong Fang; Zhouchen Lin

Zeroth-order Optimization with Weak Dimension Dependency

Pengyun Yue, Long Yang, Cong Fang, Zhouchen Lin

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:4429-4472, 2023.

Abstract

Zeroth-order optimization is a fundamental research topic that has been a focus of various learning tasks, such as black-box adversarial attacks, bandits, and reinforcement learning. However, in theory, most complexity results assert a linear dependency on the dimension of optimization variable, which implies paralyzations of zeroth-order algorithms for high-dimensional problems and cannot explain their effectiveness in practice. In this paper, we present a novel zeroth-order optimization theory characterized by complexities that exhibit weak dependencies on dimensionality. The key contribution lies in the introduction of a new factor, denoted as

$\effdim_{\alpha} = \sup_{\x\in \mathbb{R}^d} \sum_{i=1}^d \sigma_i^\alpha(\nabla^2 f(\x))$ (

$\alpha>0$ ,

$\sigma_i(\cdot)$ is the

$i$ -th singular value in non-increasing order), which effectively functions as a measure of dimensionality. The algorithms we propose demonstrate significantly reduced complexities when measured in terms of the factor

$\effdim_{\alpha}$ . Specifically, we first study a well-known zeroth-order algorithm from Nesterov and Spokoiny (2017) on quadratic objectives and show a complexity of

$\mathcal{O}\left(\frac{\effdim_1 }{\sigma_d}\log(1/\epsilon)\right)$ for the strongly convex setting. For linear regression, such a complexity is dimension-free and outperforms the traditional result by a factor of

$d$ under common conditions. Furthermore, we introduce novel algorithms that leverages the Heavy-ball mechanism to enhance the optimization process. By incorporating this acceleration scheme, our proposed algorithm exhibits a complexity of

$\mathcal{O}\left(\frac{\effdim_{1/2} }{\sqrt{\sigma_d}}\cdot\log{\frac{L}{\mu}}\cdot\log(1/\epsilon)\right)$ . For linear regression, under some mild conditions, it is faster than state-of-the-art algorithms by

$\sqrt{d}$ . We further expand the scope of the method to encompass generic smooth optimization problems, while incorporating an additional Hessian-smooth condition. By considering this extended framework, our approach becomes applicable to a broader range of optimization scenarios. The resultant algorithms demonstrate remarkable complexities, with dimension-independent dominant terms that surpass existing algorithms by an order in

$d$ under appropriate conditions. Our analysis lays the foundation for investigating zeroth-order optimization methods for smooth functions within high-dimensional settings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-yue23b,
  title = 	 {Zeroth-order Optimization with Weak Dimension Dependency},
  author =       {Yue, Pengyun and Yang, Long and Fang, Cong and Lin, Zhouchen},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {4429--4472},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/yue23b/yue23b.pdf},
  url = 	 {https://proceedings.mlr.press/v195/yue23b.html},
  abstract = 	 {Zeroth-order optimization is a fundamental research topic that has been a focus of various learning tasks, such as black-box adversarial attacks, bandits, and reinforcement learning.  However, in theory, most complexity results  assert a linear dependency on the dimension of  optimization variable, which implies paralyzations of zeroth-order algorithms for high-dimensional problems and cannot explain their effectiveness  in practice.  In this paper, we present a novel zeroth-order optimization theory characterized by complexities that exhibit weak dependencies on dimensionality. The key contribution lies in the introduction of a new factor, denoted as $\effdim_{\alpha} = \sup_{\x\in \mathbb{R}^d} \sum_{i=1}^d \sigma_i^\alpha(\nabla^2 f(\x))$ ($\alpha>0$, $\sigma_i(\cdot)$ is the $i$-th singular value in non-increasing order), which effectively functions as a measure of dimensionality. The algorithms we propose demonstrate significantly reduced complexities when measured in terms of the factor  $\effdim_{\alpha}$. Specifically, we first study  a well-known zeroth-order algorithm from Nesterov and Spokoiny (2017) on quadratic objectives and show a complexity of $ \mathcal{O}\left(\frac{\effdim_1 }{\sigma_d}\log(1/\epsilon)\right) $ for the strongly convex setting.  For linear regression, such a complexity is dimension-free and outperforms the traditional result by a factor of $d$ under common conditions. Furthermore, we introduce novel algorithms that leverages the Heavy-ball mechanism to enhance the optimization process. By incorporating this acceleration scheme, our proposed algorithm exhibits a complexity of $ \mathcal{O}\left(\frac{\effdim_{1/2} }{\sqrt{\sigma_d}}\cdot\log{\frac{L}{\mu}}\cdot\log(1/\epsilon)\right) $. For linear regression, under some mild conditions, it is faster than state-of-the-art algorithms by $\sqrt{d}$. We further expand the scope of the method to encompass generic smooth optimization problems, while incorporating an additional Hessian-smooth condition. By considering this extended framework, our approach becomes applicable to a broader range of optimization scenarios. The resultant algorithms demonstrate remarkable complexities, with dimension-independent dominant terms that surpass existing algorithms by an order in $d$ under appropriate conditions. Our analysis lays the foundation for investigating zeroth-order optimization methods for smooth functions within high-dimensional settings.}
}

Endnote

%0 Conference Paper
%T Zeroth-order Optimization with Weak Dimension Dependency
%A Pengyun Yue
%A Long Yang
%A Cong Fang
%A Zhouchen Lin
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-yue23b
%I PMLR
%P 4429--4472
%U https://proceedings.mlr.press/v195/yue23b.html
%V 195
%X Zeroth-order optimization is a fundamental research topic that has been a focus of various learning tasks, such as black-box adversarial attacks, bandits, and reinforcement learning.  However, in theory, most complexity results  assert a linear dependency on the dimension of  optimization variable, which implies paralyzations of zeroth-order algorithms for high-dimensional problems and cannot explain their effectiveness  in practice.  In this paper, we present a novel zeroth-order optimization theory characterized by complexities that exhibit weak dependencies on dimensionality. The key contribution lies in the introduction of a new factor, denoted as $\effdim_{\alpha} = \sup_{\x\in \mathbb{R}^d} \sum_{i=1}^d \sigma_i^\alpha(\nabla^2 f(\x))$ ($\alpha>0$, $\sigma_i(\cdot)$ is the $i$-th singular value in non-increasing order), which effectively functions as a measure of dimensionality. The algorithms we propose demonstrate significantly reduced complexities when measured in terms of the factor  $\effdim_{\alpha}$. Specifically, we first study  a well-known zeroth-order algorithm from Nesterov and Spokoiny (2017) on quadratic objectives and show a complexity of $ \mathcal{O}\left(\frac{\effdim_1 }{\sigma_d}\log(1/\epsilon)\right) $ for the strongly convex setting.  For linear regression, such a complexity is dimension-free and outperforms the traditional result by a factor of $d$ under common conditions. Furthermore, we introduce novel algorithms that leverages the Heavy-ball mechanism to enhance the optimization process. By incorporating this acceleration scheme, our proposed algorithm exhibits a complexity of $ \mathcal{O}\left(\frac{\effdim_{1/2} }{\sqrt{\sigma_d}}\cdot\log{\frac{L}{\mu}}\cdot\log(1/\epsilon)\right) $. For linear regression, under some mild conditions, it is faster than state-of-the-art algorithms by $\sqrt{d}$. We further expand the scope of the method to encompass generic smooth optimization problems, while incorporating an additional Hessian-smooth condition. By considering this extended framework, our approach becomes applicable to a broader range of optimization scenarios. The resultant algorithms demonstrate remarkable complexities, with dimension-independent dominant terms that surpass existing algorithms by an order in $d$ under appropriate conditions. Our analysis lays the foundation for investigating zeroth-order optimization methods for smooth functions within high-dimensional settings.

APA


Yue, P., Yang, L., Fang, C. & Lin, Z.. (2023). Zeroth-order Optimization with Weak Dimension Dependency. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:4429-4472 Available from https://proceedings.mlr.press/v195/yue23b.html.

Related Material

Download PDF