Pfeife: Automatic Pipeline Parallelism for PyTorch

Ho Young Jhoo, Chung-Kil Hur, Nuno P. Lopes
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:27221-27236, 2025.

Abstract

The memory requirements of machine learning (ML) models has been growing quickly. However, the memory capacity of GPUs has not kept pace. Despite significant research on reducing the memory usage of ML models, the larger models do not fit in a single device. A popular solution to the memory capacity issue is to use multiple devices in parallel. In this paper, we focus on a particular form of parallelism called pipelining, as it offers a good balance between cost and performance for many ML models. We present Pfeife, the first tool that integrates with PyTorch to provide automatic pipelining of ML models. Pfeife intercepts the execution of models and parallelizes them transparently, requiring no manual work. We show that Pfeife can execute large models that would otherwise not run due to not fitting in a single device. Moreover, Pfeife can pipeline non-sequential models such as Stable Diffusion, which are not supported by existing pipelining parallelism tools. Pfeife outperforms state-of-the-art tools by up to 22%.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-jhoo25a, title = {Pfeife: Automatic Pipeline Parallelism for {P}y{T}orch}, author = {Jhoo, Ho Young and Hur, Chung-Kil and Lopes, Nuno P.}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {27221--27236}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/jhoo25a/jhoo25a.pdf}, url = {https://proceedings.mlr.press/v267/jhoo25a.html}, abstract = {The memory requirements of machine learning (ML) models has been growing quickly. However, the memory capacity of GPUs has not kept pace. Despite significant research on reducing the memory usage of ML models, the larger models do not fit in a single device. A popular solution to the memory capacity issue is to use multiple devices in parallel. In this paper, we focus on a particular form of parallelism called pipelining, as it offers a good balance between cost and performance for many ML models. We present Pfeife, the first tool that integrates with PyTorch to provide automatic pipelining of ML models. Pfeife intercepts the execution of models and parallelizes them transparently, requiring no manual work. We show that Pfeife can execute large models that would otherwise not run due to not fitting in a single device. Moreover, Pfeife can pipeline non-sequential models such as Stable Diffusion, which are not supported by existing pipelining parallelism tools. Pfeife outperforms state-of-the-art tools by up to 22%.} }
Endnote
%0 Conference Paper %T Pfeife: Automatic Pipeline Parallelism for PyTorch %A Ho Young Jhoo %A Chung-Kil Hur %A Nuno P. Lopes %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-jhoo25a %I PMLR %P 27221--27236 %U https://proceedings.mlr.press/v267/jhoo25a.html %V 267 %X The memory requirements of machine learning (ML) models has been growing quickly. However, the memory capacity of GPUs has not kept pace. Despite significant research on reducing the memory usage of ML models, the larger models do not fit in a single device. A popular solution to the memory capacity issue is to use multiple devices in parallel. In this paper, we focus on a particular form of parallelism called pipelining, as it offers a good balance between cost and performance for many ML models. We present Pfeife, the first tool that integrates with PyTorch to provide automatic pipelining of ML models. Pfeife intercepts the execution of models and parallelizes them transparently, requiring no manual work. We show that Pfeife can execute large models that would otherwise not run due to not fitting in a single device. Moreover, Pfeife can pipeline non-sequential models such as Stable Diffusion, which are not supported by existing pipelining parallelism tools. Pfeife outperforms state-of-the-art tools by up to 22%.
APA
Jhoo, H.Y., Hur, C. & Lopes, N.P.. (2025). Pfeife: Automatic Pipeline Parallelism for PyTorch. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:27221-27236 Available from https://proceedings.mlr.press/v267/jhoo25a.html.

Related Material