On the Diminishing Returns of Width for Continual Learning

Etash Kumar Guha, Vihan Lakshman
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:16706-16730, 2024.

Abstract

While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from catastrophic forgetting when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN), demonstrating that the diminishing returns of increasing widths to reduce forgetting. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-guha24a, title = {On the Diminishing Returns of Width for Continual Learning}, author = {Guha, Etash Kumar and Lakshman, Vihan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {16706--16730}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/guha24a/guha24a.pdf}, url = {https://proceedings.mlr.press/v235/guha24a.html}, abstract = {While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from catastrophic forgetting when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN), demonstrating that the diminishing returns of increasing widths to reduce forgetting. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.} }
Endnote
%0 Conference Paper %T On the Diminishing Returns of Width for Continual Learning %A Etash Kumar Guha %A Vihan Lakshman %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-guha24a %I PMLR %P 16706--16730 %U https://proceedings.mlr.press/v235/guha24a.html %V 235 %X While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from catastrophic forgetting when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN), demonstrating that the diminishing returns of increasing widths to reduce forgetting. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.
APA
Guha, E.K. & Lakshman, V.. (2024). On the Diminishing Returns of Width for Continual Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:16706-16730 Available from https://proceedings.mlr.press/v235/guha24a.html.

Related Material