Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Jie Ji; Gen Li; Lu Yin; Minghai Qin; Geng Yuan; Linke Guo; Shiwei Liu; Xiaolong Ma

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Jie Ji, Gen Li, Lu Yin, Minghai Qin, Geng Yuan, Linke Guo, Shiwei Liu, Xiaolong Ma

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21606-21619, 2024.

Abstract

Dynamic Sparse Training (DST) is an effective approach for addressing the substantial training resource requirements posed by the ever-increasing size of the Deep Neural Networks (DNNs). Characterized by its dynamic "train-prune-grow” schedule during training, DST implicitly develops a bi-level structure for training the weights while discovering a subnetwork topology. However, such a structure is consistently overlooked by the current DST algorithms for further optimization opportunities, and these algorithms, on the other hand, solely optimize the weights while determining masks heuristically. In this paper, we extensively study DST algorithms and argue that the training scheme of DST naturally forms a bi-level problem in which the updating of weight and mask is interdependent. Based on this observation, we introduce a novel efficient training framework called BiDST, which for the first time, introduces bi-level optimization methodology into dynamic sparse training domain. Unlike traditional partial-heuristic DST schemes, which suffer from sub-optimal search efficiency for masks and miss the opportunity to fully explore the topological space of neural networks, BiDST excels at discovering excellent sparse patterns by optimizing mask and weight simultaneously, resulting in maximum 2.62% higher accuracy, 2.1

$\times$ faster execution speed, and 25

$\times$ reduced overhead. Code available at https://github.com/jjsrf/BiDST-ICML2024.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-ji24a,
  title = 	 {Advancing Dynamic Sparse Training by Exploring Optimization Opportunities},
  author =       {Ji, Jie and Li, Gen and Yin, Lu and Qin, Minghai and Yuan, Geng and Guo, Linke and Liu, Shiwei and Ma, Xiaolong},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {21606--21619},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ji24a/ji24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/ji24a.html},
  abstract = 	 {Dynamic Sparse Training (DST) is an effective approach for addressing the substantial training resource requirements posed by the ever-increasing size of the Deep Neural Networks (DNNs). Characterized by its dynamic "train-prune-grow” schedule during training, DST implicitly develops a bi-level structure for training the weights while discovering a subnetwork topology. However, such a structure is consistently overlooked by the current DST algorithms for further optimization opportunities, and these algorithms, on the other hand, solely optimize the weights while determining masks heuristically. In this paper, we extensively study DST algorithms and argue that the training scheme of DST naturally forms a bi-level problem in which the updating of weight and mask is interdependent. Based on this observation, we introduce a novel efficient training framework called BiDST, which for the first time, introduces bi-level optimization methodology into dynamic sparse training domain. Unlike traditional partial-heuristic DST schemes, which suffer from sub-optimal search efficiency for masks and miss the opportunity to fully explore the topological space of neural networks, BiDST excels at discovering excellent sparse patterns by optimizing mask and weight simultaneously, resulting in maximum 2.62% higher accuracy, 2.1$\times$ faster execution speed, and 25$\times$ reduced overhead. Code available at https://github.com/jjsrf/BiDST-ICML2024.}
}

Endnote

%0 Conference Paper
%T Advancing Dynamic Sparse Training by Exploring Optimization Opportunities
%A Jie Ji
%A Gen Li
%A Lu Yin
%A Minghai Qin
%A Geng Yuan
%A Linke Guo
%A Shiwei Liu
%A Xiaolong Ma
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-ji24a
%I PMLR
%P 21606--21619
%U https://proceedings.mlr.press/v235/ji24a.html
%V 235
%X Dynamic Sparse Training (DST) is an effective approach for addressing the substantial training resource requirements posed by the ever-increasing size of the Deep Neural Networks (DNNs). Characterized by its dynamic "train-prune-grow” schedule during training, DST implicitly develops a bi-level structure for training the weights while discovering a subnetwork topology. However, such a structure is consistently overlooked by the current DST algorithms for further optimization opportunities, and these algorithms, on the other hand, solely optimize the weights while determining masks heuristically. In this paper, we extensively study DST algorithms and argue that the training scheme of DST naturally forms a bi-level problem in which the updating of weight and mask is interdependent. Based on this observation, we introduce a novel efficient training framework called BiDST, which for the first time, introduces bi-level optimization methodology into dynamic sparse training domain. Unlike traditional partial-heuristic DST schemes, which suffer from sub-optimal search efficiency for masks and miss the opportunity to fully explore the topological space of neural networks, BiDST excels at discovering excellent sparse patterns by optimizing mask and weight simultaneously, resulting in maximum 2.62% higher accuracy, 2.1$\times$ faster execution speed, and 25$\times$ reduced overhead. Code available at https://github.com/jjsrf/BiDST-ICML2024.

APA


Ji, J., Li, G., Yin, L., Qin, M., Yuan, G., Guo, L., Liu, S. & Ma, X.. (2024). Advancing Dynamic Sparse Training by Exploring Optimization Opportunities. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21606-21619 Available from https://proceedings.mlr.press/v235/ji24a.html.

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Abstract

Cite this Paper

Related Material