Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization

Aleksandra Nowak; Łukasz Gniecki; Filip Szatkowski; Jacek Tabor

Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization

Aleksandra Nowak, Łukasz Gniecki, Filip Szatkowski, Jacek Tabor

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:38474-38494, 2024.

Abstract

Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask’s potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary densities. We demonstrate the superior effectiveness and efficiency of EOI through experiments, consistently outperforming common sparse initialization techniques. Our method enables training highly sparse 1000-layer MLP and CNN networks without residual connections or normalization techniques, emphasizing the crucial role of weight initialization in static sparse training alongside sparse mask selection.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-nowak24a,
  title = 	 {Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization},
  author =       {Nowak, Aleksandra and Gniecki, {\L}ukasz and Szatkowski, Filip and Tabor, Jacek},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {38474--38494},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/nowak24a/nowak24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/nowak24a.html},
  abstract = 	 {Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask’s potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary densities. We demonstrate the superior effectiveness and efficiency of EOI through experiments, consistently outperforming common sparse initialization techniques. Our method enables training highly sparse 1000-layer MLP and CNN networks without residual connections or normalization techniques, emphasizing the crucial role of weight initialization in static sparse training alongside sparse mask selection.}
}

Endnote

%0 Conference Paper
%T Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization
%A Aleksandra Nowak
%A Łukasz Gniecki
%A Filip Szatkowski
%A Jacek Tabor
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-nowak24a
%I PMLR
%P 38474--38494
%U https://proceedings.mlr.press/v235/nowak24a.html
%V 235
%X Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask’s potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary densities. We demonstrate the superior effectiveness and efficiency of EOI through experiments, consistently outperforming common sparse initialization techniques. Our method enables training highly sparse 1000-layer MLP and CNN networks without residual connections or normalization techniques, emphasizing the crucial role of weight initialization in static sparse training alongside sparse mask selection.

APA


Nowak, A., Gniecki, Ł., Szatkowski, F. & Tabor, J.. (2024). Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:38474-38494 Available from https://proceedings.mlr.press/v235/nowak24a.html.

Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization

Abstract

Cite this Paper

Related Material