Weight-Sharing Regularization

Mehran Shakerinava, Motahareh MS Sohrabi, Siamak Ravanbakhsh, Simon Lacoste-Julien
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4204-4212, 2024.

Abstract

Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a “weight-sharing regularization” penalty on the weights wRd of a neural network, defined as R(w)=1d1di>j|wiwj|. We study the proximal mapping of R and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for proxR (to run on GPU) and find that one of them is fast in practice but slow (O(d)) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in O(log3d) when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on \href{https://github.com/motahareh-sohrabi/weight-sharing-regularization}{github}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-shakerinava24a, title = {Weight-Sharing Regularization}, author = {Shakerinava, Mehran and MS Sohrabi, Motahareh and Ravanbakhsh, Siamak and Lacoste-Julien, Simon}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {4204--4212}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/shakerinava24a/shakerinava24a.pdf}, url = {https://proceedings.mlr.press/v238/shakerinava24a.html}, abstract = {Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a “weight-sharing regularization” penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for $\mathrm{prox}_{\mathcal{R}}$ (to run on GPU) and find that one of them is fast in practice but slow ($O(d)$) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in $O(\log^3 d)$ when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on \href{https://github.com/motahareh-sohrabi/weight-sharing-regularization}{github}.} }
Endnote
%0 Conference Paper %T Weight-Sharing Regularization %A Mehran Shakerinava %A Motahareh MS Sohrabi %A Siamak Ravanbakhsh %A Simon Lacoste-Julien %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-shakerinava24a %I PMLR %P 4204--4212 %U https://proceedings.mlr.press/v238/shakerinava24a.html %V 238 %X Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a “weight-sharing regularization” penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for $\mathrm{prox}_{\mathcal{R}}$ (to run on GPU) and find that one of them is fast in practice but slow ($O(d)$) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in $O(\log^3 d)$ when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on \href{https://github.com/motahareh-sohrabi/weight-sharing-regularization}{github}.
APA
Shakerinava, M., MS Sohrabi, M., Ravanbakhsh, S. & Lacoste-Julien, S.. (2024). Weight-Sharing Regularization. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4204-4212 Available from https://proceedings.mlr.press/v238/shakerinava24a.html.

Related Material