Overcoming Catastrophic Forgetting with Hard Attention to the Task

Joan Serra, Didac Suris, Marius Miron, Alexandros Karatzoglou
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4548-4557, 2018.

Abstract

Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks’ information without affecting the current task’s learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-serra18a, title = {Overcoming Catastrophic Forgetting with Hard Attention to the Task}, author = {Serra, Joan and Suris, Didac and Miron, Marius and Karatzoglou, Alexandros}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {4548--4557}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/serra18a/serra18a.pdf}, url = {https://proceedings.mlr.press/v80/serra18a.html}, abstract = {Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks’ information without affecting the current task’s learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.} }
Endnote
%0 Conference Paper %T Overcoming Catastrophic Forgetting with Hard Attention to the Task %A Joan Serra %A Didac Suris %A Marius Miron %A Alexandros Karatzoglou %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-serra18a %I PMLR %P 4548--4557 %U https://proceedings.mlr.press/v80/serra18a.html %V 80 %X Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks’ information without affecting the current task’s learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.
APA
Serra, J., Suris, D., Miron, M. & Karatzoglou, A.. (2018). Overcoming Catastrophic Forgetting with Hard Attention to the Task. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4548-4557 Available from https://proceedings.mlr.press/v80/serra18a.html.

Related Material