Strong Lottery Ticket Hypothesis with $\varepsilon$–perturbation

Zheyang Xiong, Fangshuo Liao, Anastasios Kyrillidis
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6879-6902, 2023.

Abstract

The strong Lottery Ticket Hypothesis (LTH) (Ramanujan et al., 2019; Zhou et al., 2019) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training. We extend the theoretical guarantee of the strong LTH literature to a scenario more similar to the original LTH, by generalizing the weight change in the pre-training step to some perturbation around initialization. In particular, we focus on the following open questions: By allowing an $\varepsilon$-scale perturbation on the random initial weights, can we reduce the over-parameterization requirement for the candidate network in the strong LTH? Furthermore, does the weight change by SGD coincide with a good set of such perturbation? We answer the first question by first extending the theoretical result on subset sum problem (Lueker, 1998) to allow perturbation on the candidates. Applying this result to the neural network setting, we show that by allowing $\varepsilon$-scale perturbation, we can reduce the over-parameterization requirement of the strong LTH by a factor of $O(1/(1+\varepsilon))$. To answer the second question, we show via experiments that the perturbed weight achieved by the projected SGD shows better performance under the strong LTH pruning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-xiong23a, title = {Strong Lottery Ticket Hypothesis with $\varepsilon$–perturbation}, author = {Xiong, Zheyang and Liao, Fangshuo and Kyrillidis, Anastasios}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {6879--6902}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/xiong23a/xiong23a.pdf}, url = {https://proceedings.mlr.press/v206/xiong23a.html}, abstract = {The strong Lottery Ticket Hypothesis (LTH) (Ramanujan et al., 2019; Zhou et al., 2019) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training. We extend the theoretical guarantee of the strong LTH literature to a scenario more similar to the original LTH, by generalizing the weight change in the pre-training step to some perturbation around initialization. In particular, we focus on the following open questions: By allowing an $\varepsilon$-scale perturbation on the random initial weights, can we reduce the over-parameterization requirement for the candidate network in the strong LTH? Furthermore, does the weight change by SGD coincide with a good set of such perturbation? We answer the first question by first extending the theoretical result on subset sum problem (Lueker, 1998) to allow perturbation on the candidates. Applying this result to the neural network setting, we show that by allowing $\varepsilon$-scale perturbation, we can reduce the over-parameterization requirement of the strong LTH by a factor of $O(1/(1+\varepsilon))$. To answer the second question, we show via experiments that the perturbed weight achieved by the projected SGD shows better performance under the strong LTH pruning.} }
Endnote
%0 Conference Paper %T Strong Lottery Ticket Hypothesis with $\varepsilon$–perturbation %A Zheyang Xiong %A Fangshuo Liao %A Anastasios Kyrillidis %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-xiong23a %I PMLR %P 6879--6902 %U https://proceedings.mlr.press/v206/xiong23a.html %V 206 %X The strong Lottery Ticket Hypothesis (LTH) (Ramanujan et al., 2019; Zhou et al., 2019) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training. We extend the theoretical guarantee of the strong LTH literature to a scenario more similar to the original LTH, by generalizing the weight change in the pre-training step to some perturbation around initialization. In particular, we focus on the following open questions: By allowing an $\varepsilon$-scale perturbation on the random initial weights, can we reduce the over-parameterization requirement for the candidate network in the strong LTH? Furthermore, does the weight change by SGD coincide with a good set of such perturbation? We answer the first question by first extending the theoretical result on subset sum problem (Lueker, 1998) to allow perturbation on the candidates. Applying this result to the neural network setting, we show that by allowing $\varepsilon$-scale perturbation, we can reduce the over-parameterization requirement of the strong LTH by a factor of $O(1/(1+\varepsilon))$. To answer the second question, we show via experiments that the perturbed weight achieved by the projected SGD shows better performance under the strong LTH pruning.
APA
Xiong, Z., Liao, F. & Kyrillidis, A.. (2023). Strong Lottery Ticket Hypothesis with $\varepsilon$–perturbation. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:6879-6902 Available from https://proceedings.mlr.press/v206/xiong23a.html.

Related Material