Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization

Zhenzhang Ye, Gabriel Peyré, Daniel Cremers, Pierre Ablin
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:955-963, 2024.

Abstract

Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT). As a function of the error of the inner problem resolution, we study the error of the IFT method. We analyze two strategies to reduce this error: preconditioning the IFT formula and reparameterizing the inner problem. We give a detailed account of the impact of these two modifications on the error, highlighting the role played by higher-order derivatives of the functionals at stake. Our theoretical findings explain when super efficiency, namely reaching an error on the hypergradient that depends quadratically on the error on the inner problem, is achievable and compare the two approaches when this is impossible. Numerical evaluations on hyperparameter tuning for regression problems substantiate our theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-ye24a, title = {Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization}, author = {Ye, Zhenzhang and Peyr\'{e}, Gabriel and Cremers, Daniel and Ablin, Pierre}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {955--963}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/ye24a/ye24a.pdf}, url = {https://proceedings.mlr.press/v238/ye24a.html}, abstract = {Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT). As a function of the error of the inner problem resolution, we study the error of the IFT method. We analyze two strategies to reduce this error: preconditioning the IFT formula and reparameterizing the inner problem. We give a detailed account of the impact of these two modifications on the error, highlighting the role played by higher-order derivatives of the functionals at stake. Our theoretical findings explain when super efficiency, namely reaching an error on the hypergradient that depends quadratically on the error on the inner problem, is achievable and compare the two approaches when this is impossible. Numerical evaluations on hyperparameter tuning for regression problems substantiate our theoretical findings.} }
Endnote
%0 Conference Paper %T Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization %A Zhenzhang Ye %A Gabriel Peyré %A Daniel Cremers %A Pierre Ablin %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-ye24a %I PMLR %P 955--963 %U https://proceedings.mlr.press/v238/ye24a.html %V 238 %X Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT). As a function of the error of the inner problem resolution, we study the error of the IFT method. We analyze two strategies to reduce this error: preconditioning the IFT formula and reparameterizing the inner problem. We give a detailed account of the impact of these two modifications on the error, highlighting the role played by higher-order derivatives of the functionals at stake. Our theoretical findings explain when super efficiency, namely reaching an error on the hypergradient that depends quadratically on the error on the inner problem, is achievable and compare the two approaches when this is impossible. Numerical evaluations on hyperparameter tuning for regression problems substantiate our theoretical findings.
APA
Ye, Z., Peyré, G., Cremers, D. & Ablin, P.. (2024). Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:955-963 Available from https://proceedings.mlr.press/v238/ye24a.html.

Related Material