Scalable Meta-Learning via Mixed-Mode Differentiation

Iurii Kemaev, Dan A. Calian, Luisa M Zintgraf, Gregory Farquhar, Hado Van Hasselt
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:29687-29705, 2025.

Abstract

Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation process itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG – a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kemaev25a, title = {Scalable Meta-Learning via Mixed-Mode Differentiation}, author = {Kemaev, Iurii and Calian, Dan A. and Zintgraf, Luisa M and Farquhar, Gregory and Van Hasselt, Hado}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {29687--29705}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kemaev25a/kemaev25a.pdf}, url = {https://proceedings.mlr.press/v267/kemaev25a.html}, abstract = {Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation process itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG – a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.} }
Endnote
%0 Conference Paper %T Scalable Meta-Learning via Mixed-Mode Differentiation %A Iurii Kemaev %A Dan A. Calian %A Luisa M Zintgraf %A Gregory Farquhar %A Hado Van Hasselt %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kemaev25a %I PMLR %P 29687--29705 %U https://proceedings.mlr.press/v267/kemaev25a.html %V 267 %X Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation process itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG – a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.
APA
Kemaev, I., Calian, D.A., Zintgraf, L.M., Farquhar, G. & Van Hasselt, H.. (2025). Scalable Meta-Learning via Mixed-Mode Differentiation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:29687-29705 Available from https://proceedings.mlr.press/v267/kemaev25a.html.

Related Material