LAuReL: Learned Augmented Residual Layer

Gaurav Menghani, Ravi Kumar, Sanjiv Kumar
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43826-43836, 2025.

Abstract

One of the core pillars of efficient deep learning methods are architectural improvements, such as residual/skip connections, which have led to significantly better model convergence and quality. Since their introduction, residual connections have become ubiquitous not only in convolutional neural networks but also in transformer-based architectures, the backbone of LLMs. In this paper, we introduce the Learned Augmented Residual Layer (LAuReL) — a novel generalization of the canonical residual connection — designed to serve as an in-situ replacement while outperforming it in both model quality and footprint metrics. Our experiments show that LAuReL can enhance quality for both vision and language models while adding fewer parameters and incurring less latency and memory overhead than naively increasing parameter count. For example, on the ImageNet-1K task, LAuReL achieves the same model quality improvements as naively adding an extra layer while using $2.6 \times$ fewer parameters. Similarly, when pre-training 1B and 4B parameter LLMs, LAuReL improves performance on a variety of challenging downstream evaluation tasks by 2.54% to 20.05%, while adding only 0.012% and 0.1% additional parameters, respectively.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-menghani25a, title = {{LA}u{R}e{L}: Learned Augmented Residual Layer}, author = {Menghani, Gaurav and Kumar, Ravi and Kumar, Sanjiv}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {43826--43836}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/menghani25a/menghani25a.pdf}, url = {https://proceedings.mlr.press/v267/menghani25a.html}, abstract = {One of the core pillars of efficient deep learning methods are architectural improvements, such as residual/skip connections, which have led to significantly better model convergence and quality. Since their introduction, residual connections have become ubiquitous not only in convolutional neural networks but also in transformer-based architectures, the backbone of LLMs. In this paper, we introduce the Learned Augmented Residual Layer (LAuReL) — a novel generalization of the canonical residual connection — designed to serve as an in-situ replacement while outperforming it in both model quality and footprint metrics. Our experiments show that LAuReL can enhance quality for both vision and language models while adding fewer parameters and incurring less latency and memory overhead than naively increasing parameter count. For example, on the ImageNet-1K task, LAuReL achieves the same model quality improvements as naively adding an extra layer while using $2.6 \times$ fewer parameters. Similarly, when pre-training 1B and 4B parameter LLMs, LAuReL improves performance on a variety of challenging downstream evaluation tasks by 2.54% to 20.05%, while adding only 0.012% and 0.1% additional parameters, respectively.} }
Endnote
%0 Conference Paper %T LAuReL: Learned Augmented Residual Layer %A Gaurav Menghani %A Ravi Kumar %A Sanjiv Kumar %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-menghani25a %I PMLR %P 43826--43836 %U https://proceedings.mlr.press/v267/menghani25a.html %V 267 %X One of the core pillars of efficient deep learning methods are architectural improvements, such as residual/skip connections, which have led to significantly better model convergence and quality. Since their introduction, residual connections have become ubiquitous not only in convolutional neural networks but also in transformer-based architectures, the backbone of LLMs. In this paper, we introduce the Learned Augmented Residual Layer (LAuReL) — a novel generalization of the canonical residual connection — designed to serve as an in-situ replacement while outperforming it in both model quality and footprint metrics. Our experiments show that LAuReL can enhance quality for both vision and language models while adding fewer parameters and incurring less latency and memory overhead than naively increasing parameter count. For example, on the ImageNet-1K task, LAuReL achieves the same model quality improvements as naively adding an extra layer while using $2.6 \times$ fewer parameters. Similarly, when pre-training 1B and 4B parameter LLMs, LAuReL improves performance on a variety of challenging downstream evaluation tasks by 2.54% to 20.05%, while adding only 0.012% and 0.1% additional parameters, respectively.
APA
Menghani, G., Kumar, R. & Kumar, S.. (2025). LAuReL: Learned Augmented Residual Layer. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:43826-43836 Available from https://proceedings.mlr.press/v267/menghani25a.html.

Related Material