Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Wanli Shi, Yi Chang, Bin Gu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:44838-44864, 2024.

Abstract

Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions, and lack a solid analysis of the convergence rate. What’s worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a single-loop single-timescale algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(\delta, \epsilon)$-stationary point with $\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-shi24a, title = {Double Momentum Method for Lower-Level Constrained Bilevel Optimization}, author = {Shi, Wanli and Chang, Yi and Gu, Bin}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {44838--44864}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/shi24a/shi24a.pdf}, url = {https://proceedings.mlr.press/v235/shi24a.html}, abstract = {Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions, and lack a solid analysis of the convergence rate. What’s worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a single-loop single-timescale algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(\delta, \epsilon)$-stationary point with $\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method.} }
Endnote
%0 Conference Paper %T Double Momentum Method for Lower-Level Constrained Bilevel Optimization %A Wanli Shi %A Yi Chang %A Bin Gu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-shi24a %I PMLR %P 44838--44864 %U https://proceedings.mlr.press/v235/shi24a.html %V 235 %X Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions, and lack a solid analysis of the convergence rate. What’s worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a single-loop single-timescale algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(\delta, \epsilon)$-stationary point with $\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method.
APA
Shi, W., Chang, Y. & Gu, B.. (2024). Double Momentum Method for Lower-Level Constrained Bilevel Optimization. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:44838-44864 Available from https://proceedings.mlr.press/v235/shi24a.html.

Related Material