On the Iteration Complexity of Hypergradient Computation

Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, Saverio Salzo
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3748-3758, 2020.

Abstract

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-grazzi20a, title = {On the Iteration Complexity of Hypergradient Computation}, author = {Grazzi, Riccardo and Franceschi, Luca and Pontil, Massimiliano and Salzo, Saverio}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3748--3758}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/grazzi20a/grazzi20a.pdf}, url = { http://proceedings.mlr.press/v119/grazzi20a.html }, abstract = {We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.} }
Endnote
%0 Conference Paper %T On the Iteration Complexity of Hypergradient Computation %A Riccardo Grazzi %A Luca Franceschi %A Massimiliano Pontil %A Saverio Salzo %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-grazzi20a %I PMLR %P 3748--3758 %U http://proceedings.mlr.press/v119/grazzi20a.html %V 119 %X We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.
APA
Grazzi, R., Franceschi, L., Pontil, M. & Salzo, S.. (2020). On the Iteration Complexity of Hypergradient Computation. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3748-3758 Available from http://proceedings.mlr.press/v119/grazzi20a.html .

Related Material