On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

Lesi Chen; Jing Xu; Jingzhao Zhang

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

Lesi Chen, Jing Xu, Jingzhao Zhang

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:947-980, 2024.

Abstract

Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where \textit{the lower-level functions lack strong convexity}. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-Ł{}ojasiewicz (PL) condition. We show a simple first-order algorithm can achieve complexity bounds of

$\tilde{\mathcal{O}}(\epsilon^{-2})$ ,

$\tilde{\mathcal{O}}(\epsilon^{-4})$ and

$\tilde{\mathcal{O}}(\epsilon^{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-chen24a,
  title = 	 {On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis},
  author =       {Chen, Lesi and Xu, Jing and Zhang, Jingzhao},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {947--980},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/chen24a/chen24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/chen24a.html},
  abstract = 	 {Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning.  A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where \textit{the lower-level functions lack strong convexity}.  In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-Ł{}ojasiewicz (PL) condition. We show a simple first-order algorithm can achieve complexity bounds of $\tilde{\mathcal{O}}(\epsilon^{-2})$, $\tilde{\mathcal{O}}(\epsilon^{-4})$ and $\tilde{\mathcal{O}}(\epsilon^{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively.}
}

Endnote

%0 Conference Paper
%T On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
%A Lesi Chen
%A Jing Xu
%A Jingzhao Zhang
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-chen24a
%I PMLR
%P 947--980
%U https://proceedings.mlr.press/v247/chen24a.html
%V 247
%X Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning.  A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where \textit{the lower-level functions lack strong convexity}.  In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-Ł{}ojasiewicz (PL) condition. We show a simple first-order algorithm can achieve complexity bounds of $\tilde{\mathcal{O}}(\epsilon^{-2})$, $\tilde{\mathcal{O}}(\epsilon^{-4})$ and $\tilde{\mathcal{O}}(\epsilon^{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively.

APA


Chen, L., Xu, J. & Zhang, J.. (2024). On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:947-980 Available from https://proceedings.mlr.press/v247/chen24a.html.

Related Material

Download PDF