HiRemate: Hierarchical Approach for Efficient Re-materialization of Neural Networks

Julia Gusak, Xunyi Zhao, Théotime Le Hellard, Zhe Li, Lionel Eyraud-Dubois, Olivier Beaumont
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:21418-21443, 2025.

Abstract

Training deep neural networks (DNNs) on memory-limited GPUs is challenging, as storing intermediate activations often exceeds available memory. Re-materialization, a technique that preserves exact computations, addresses this by selectively recomputing activations instead of storing them. However, existing methods either fail to scale, lack generality, or introduce excessive execution overhead. We introduce ${\mbox{HiRemate}}$ a ${\textit hierarchical}$ re-materialization framework that recursively partitions large computation graphs, applies optimized solvers at multiple levels, and merges solutions into a global efficient training schedule. This enables scalability to significantly larger graphs than prior ILP-based methods while keeping runtime overhead low. Designed for single-GPU models and activation re-materialization, HiRemate extends the feasibility of training networks with thousands of graph nodes, surpassing prior methods in both efficiency and scalability. Experiments on various types of networks yield up to 50-70% memory reduction with only 10-15% overhead, closely matching optimal solutions while significantly reducing solver time. Seamlessly integrating with PyTorch Autograd, HiRemate requires almost no code change to use, enabling broad adoption in memory-constrained deep learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gusak25a, title = {{H}i{R}emate: Hierarchical Approach for Efficient Re-materialization of Neural Networks}, author = {Gusak, Julia and Zhao, Xunyi and Le Hellard, Th\'{e}otime and Li, Zhe and Eyraud-Dubois, Lionel and Beaumont, Olivier}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {21418--21443}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gusak25a/gusak25a.pdf}, url = {https://proceedings.mlr.press/v267/gusak25a.html}, abstract = {Training deep neural networks (DNNs) on memory-limited GPUs is challenging, as storing intermediate activations often exceeds available memory. Re-materialization, a technique that preserves exact computations, addresses this by selectively recomputing activations instead of storing them. However, existing methods either fail to scale, lack generality, or introduce excessive execution overhead. We introduce ${\mbox{HiRemate}}$ a ${\textit hierarchical}$ re-materialization framework that recursively partitions large computation graphs, applies optimized solvers at multiple levels, and merges solutions into a global efficient training schedule. This enables scalability to significantly larger graphs than prior ILP-based methods while keeping runtime overhead low. Designed for single-GPU models and activation re-materialization, HiRemate extends the feasibility of training networks with thousands of graph nodes, surpassing prior methods in both efficiency and scalability. Experiments on various types of networks yield up to 50-70% memory reduction with only 10-15% overhead, closely matching optimal solutions while significantly reducing solver time. Seamlessly integrating with PyTorch Autograd, HiRemate requires almost no code change to use, enabling broad adoption in memory-constrained deep learning.} }
Endnote
%0 Conference Paper %T HiRemate: Hierarchical Approach for Efficient Re-materialization of Neural Networks %A Julia Gusak %A Xunyi Zhao %A Théotime Le Hellard %A Zhe Li %A Lionel Eyraud-Dubois %A Olivier Beaumont %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gusak25a %I PMLR %P 21418--21443 %U https://proceedings.mlr.press/v267/gusak25a.html %V 267 %X Training deep neural networks (DNNs) on memory-limited GPUs is challenging, as storing intermediate activations often exceeds available memory. Re-materialization, a technique that preserves exact computations, addresses this by selectively recomputing activations instead of storing them. However, existing methods either fail to scale, lack generality, or introduce excessive execution overhead. We introduce ${\mbox{HiRemate}}$ a ${\textit hierarchical}$ re-materialization framework that recursively partitions large computation graphs, applies optimized solvers at multiple levels, and merges solutions into a global efficient training schedule. This enables scalability to significantly larger graphs than prior ILP-based methods while keeping runtime overhead low. Designed for single-GPU models and activation re-materialization, HiRemate extends the feasibility of training networks with thousands of graph nodes, surpassing prior methods in both efficiency and scalability. Experiments on various types of networks yield up to 50-70% memory reduction with only 10-15% overhead, closely matching optimal solutions while significantly reducing solver time. Seamlessly integrating with PyTorch Autograd, HiRemate requires almost no code change to use, enabling broad adoption in memory-constrained deep learning.
APA
Gusak, J., Zhao, X., Le Hellard, T., Li, Z., Eyraud-Dubois, L. & Beaumont, O.. (2025). HiRemate: Hierarchical Approach for Efficient Re-materialization of Neural Networks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:21418-21443 Available from https://proceedings.mlr.press/v267/gusak25a.html.

Related Material