Weight-Sharing Regularization

Mehran Shakerinava; Motahareh MS Sohrabi; Siamak Ravanbakhsh; Simon Lacoste-Julien

Weight-Sharing Regularization

Mehran Shakerinava, Motahareh MS Sohrabi, Siamak Ravanbakhsh, Simon Lacoste-Julien

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4204-4212, 2024.

Abstract

Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a “weight-sharing regularization” penalty on the weights

$w \in \mathbb{R}^d$ of a neural network, defined as

$\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$ . We study the proximal mapping of

$\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for

$\mathrm{prox}_{\mathcal{R}}$ (to run on GPU) and find that one of them is fast in practice but slow (

$O(d)$ ) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in

$O(\log^3 d)$ when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on \href{https://github.com/motahareh-sohrabi/weight-sharing-regularization}{github}.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-shakerinava24a,
  title = 	 {Weight-Sharing Regularization},
  author =       {Shakerinava, Mehran and MS Sohrabi, Motahareh and Ravanbakhsh, Siamak and Lacoste-Julien, Simon},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4204--4212},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/shakerinava24a/shakerinava24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/shakerinava24a.html},
  abstract = 	 {Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a “weight-sharing regularization” penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for $\mathrm{prox}_{\mathcal{R}}$ (to run on GPU) and find that one of them is fast in practice but slow ($O(d)$) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in $O(\log^3 d)$ when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on \href{https://github.com/motahareh-sohrabi/weight-sharing-regularization}{github}.}
}

Endnote

%0 Conference Paper
%T Weight-Sharing Regularization
%A Mehran Shakerinava
%A Motahareh MS Sohrabi
%A Siamak Ravanbakhsh
%A Simon Lacoste-Julien
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-shakerinava24a
%I PMLR
%P 4204--4212
%U https://proceedings.mlr.press/v238/shakerinava24a.html
%V 238
%X Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a “weight-sharing regularization” penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for $\mathrm{prox}_{\mathcal{R}}$ (to run on GPU) and find that one of them is fast in practice but slow ($O(d)$) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in $O(\log^3 d)$ when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on \href{https://github.com/motahareh-sohrabi/weight-sharing-regularization}{github}.

APA

Shakerinava, M., MS Sohrabi, M., Ravanbakhsh, S. & Lacoste-Julien, S.. (2024). Weight-Sharing Regularization. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4204-4212 Available from https://proceedings.mlr.press/v238/shakerinava24a.html.

Weight-Sharing Regularization

Abstract

Cite this Paper

Related Material