Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity

Risheng Liu, Yaohua Liu, Wei Yao, Shangzhi Zeng, Jin Zhang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:21839-21866, 2023.

Abstract

Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower- Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-liu23y, title = {Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity}, author = {Liu, Risheng and Liu, Yaohua and Yao, Wei and Zeng, Shangzhi and Zhang, Jin}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {21839--21866}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/liu23y/liu23y.pdf}, url = {https://proceedings.mlr.press/v202/liu23y.html}, abstract = {Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower- Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.} }
Endnote
%0 Conference Paper %T Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity %A Risheng Liu %A Yaohua Liu %A Wei Yao %A Shangzhi Zeng %A Jin Zhang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-liu23y %I PMLR %P 21839--21866 %U https://proceedings.mlr.press/v202/liu23y.html %V 202 %X Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower- Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.
APA
Liu, R., Liu, Y., Yao, W., Zeng, S. & Zhang, J.. (2023). Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:21839-21866 Available from https://proceedings.mlr.press/v202/liu23y.html.

Related Material