Winning the Lottery Ahead of Time: Efficient Early Network Pruning

John Rachwan, Daniel Zügner, Bertrand Charpentier, Simon Geisler, Morgane Ayle, Stephan Günnemann
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:18293-18309, 2022.

Abstract

Pruning, the task of sparsifying deep neural networks, received increasing attention recently. Although state-of-the-art pruning methods extract highly sparse models, they neglect two main challenges: (1) the process of finding these sparse models is often very expensive; (2) unstructured pruning does not provide benefits in terms of GPU memory, training time, or carbon emissions. We propose Early Compression via Gradient Flow Preservation (EarlyCroP), which efficiently extracts state-of-the-art sparse models before or early in training addressing challenge (1), and can be applied in a structured manner addressing challenge (2). This enables us to train sparse networks on commodity GPUs whose dense versions would be too large, thereby saving costs and reducing hardware requirements. We empirically show that EarlyCroP outperforms a rich set of baselines for many tasks (incl. classification, regression) and domains (incl. computer vision, natural language processing, and reinforcment learning). EarlyCroP leads to accuracy comparable to dense training while outperforming pruning baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-rachwan22a, title = {Winning the Lottery Ahead of Time: Efficient Early Network Pruning}, author = {Rachwan, John and Z{\"u}gner, Daniel and Charpentier, Bertrand and Geisler, Simon and Ayle, Morgane and G{\"u}nnemann, Stephan}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {18293--18309}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/rachwan22a/rachwan22a.pdf}, url = {https://proceedings.mlr.press/v162/rachwan22a.html}, abstract = {Pruning, the task of sparsifying deep neural networks, received increasing attention recently. Although state-of-the-art pruning methods extract highly sparse models, they neglect two main challenges: (1) the process of finding these sparse models is often very expensive; (2) unstructured pruning does not provide benefits in terms of GPU memory, training time, or carbon emissions. We propose Early Compression via Gradient Flow Preservation (EarlyCroP), which efficiently extracts state-of-the-art sparse models before or early in training addressing challenge (1), and can be applied in a structured manner addressing challenge (2). This enables us to train sparse networks on commodity GPUs whose dense versions would be too large, thereby saving costs and reducing hardware requirements. We empirically show that EarlyCroP outperforms a rich set of baselines for many tasks (incl. classification, regression) and domains (incl. computer vision, natural language processing, and reinforcment learning). EarlyCroP leads to accuracy comparable to dense training while outperforming pruning baselines.} }
Endnote
%0 Conference Paper %T Winning the Lottery Ahead of Time: Efficient Early Network Pruning %A John Rachwan %A Daniel Zügner %A Bertrand Charpentier %A Simon Geisler %A Morgane Ayle %A Stephan Günnemann %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-rachwan22a %I PMLR %P 18293--18309 %U https://proceedings.mlr.press/v162/rachwan22a.html %V 162 %X Pruning, the task of sparsifying deep neural networks, received increasing attention recently. Although state-of-the-art pruning methods extract highly sparse models, they neglect two main challenges: (1) the process of finding these sparse models is often very expensive; (2) unstructured pruning does not provide benefits in terms of GPU memory, training time, or carbon emissions. We propose Early Compression via Gradient Flow Preservation (EarlyCroP), which efficiently extracts state-of-the-art sparse models before or early in training addressing challenge (1), and can be applied in a structured manner addressing challenge (2). This enables us to train sparse networks on commodity GPUs whose dense versions would be too large, thereby saving costs and reducing hardware requirements. We empirically show that EarlyCroP outperforms a rich set of baselines for many tasks (incl. classification, regression) and domains (incl. computer vision, natural language processing, and reinforcment learning). EarlyCroP leads to accuracy comparable to dense training while outperforming pruning baselines.
APA
Rachwan, J., Zügner, D., Charpentier, B., Geisler, S., Ayle, M. & Günnemann, S.. (2022). Winning the Lottery Ahead of Time: Efficient Early Network Pruning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:18293-18309 Available from https://proceedings.mlr.press/v162/rachwan22a.html.

Related Material