Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation

Sebastian Lee; Stefano Sarao Mannelli; Claudia Clopath; Sebastian Goldt; Andrew Saxe

Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation

Sebastian Lee, Stefano Sarao Mannelli, Claudia Clopath, Sebastian Goldt, Andrew Saxe

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12455-12477, 2022.

Abstract

Continual learning—learning new tasks in sequence while maintaining performance on old tasks—remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow’s Hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-lee22g,
  title = 	 {Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation},
  author =       {Lee, Sebastian and Mannelli, Stefano Sarao and Clopath, Claudia and Goldt, Sebastian and Saxe, Andrew},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {12455--12477},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/lee22g/lee22g.pdf},
  url = 	 {https://proceedings.mlr.press/v162/lee22g.html},
  abstract = 	 {Continual learning—learning new tasks in sequence while maintaining performance on old tasks—remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow’s Hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.}
}

Endnote

%0 Conference Paper
%T Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation
%A Sebastian Lee
%A Stefano Sarao Mannelli
%A Claudia Clopath
%A Sebastian Goldt
%A Andrew Saxe
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-lee22g
%I PMLR
%P 12455--12477
%U https://proceedings.mlr.press/v162/lee22g.html
%V 162
%X Continual learning—learning new tasks in sequence while maintaining performance on old tasks—remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow’s Hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.

APA


Lee, S., Mannelli, S.S., Clopath, C., Goldt, S. & Saxe, A.. (2022). Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:12455-12477 Available from https://proceedings.mlr.press/v162/lee22g.html.

Related Material

Download PDF