Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation

Guanglong Sun, Hongwei Yan, Liyuan Wang, Qian Li, Bo Lei, Yi Zhong
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:57790-57807, 2025.

Abstract

Knowledge distillation (KD) is a powerful strategy for training deep neural networks (DNNs). While it was originally proposed to train a more compact “student” model from a large “teacher” model, many recent efforts have focused on adapting it as an effective way to promote generalization of the model itself, such as online KD and self KD. Here, we propose an easy-to-use and compatible strategy named Spaced KD to improve the effectiveness of both online KD and self KD, in which the student model distills knowledge from a teacher model trained with a space interval ahead. This strategy is inspired by a prominent theory named spacing effect in the field of biological learning and memory, positing that appropriate intervals between learning trials can significantly enhance learning performance. We provide an in-depth theoretical and empirical analysis showing that the benefits of the proposed spacing effect in KD stem from seeking a flat minima during stochastic gradient descent (SGD). We perform extensive experiments to demonstrate the effectiveness of our Spaced KD in improving the learning performance of DNNs (e.g., the additional performance gain is up to 2.31% and 3.34% on Tiny-ImageNet over online KD and self KD, respectively). Our codes have been released on github https://github.com/SunGL001/Spaced-KD.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-sun25w, title = {Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation}, author = {Sun, Guanglong and Yan, Hongwei and Wang, Liyuan and Li, Qian and Lei, Bo and Zhong, Yi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {57790--57807}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/sun25w/sun25w.pdf}, url = {https://proceedings.mlr.press/v267/sun25w.html}, abstract = {Knowledge distillation (KD) is a powerful strategy for training deep neural networks (DNNs). While it was originally proposed to train a more compact “student” model from a large “teacher” model, many recent efforts have focused on adapting it as an effective way to promote generalization of the model itself, such as online KD and self KD. Here, we propose an easy-to-use and compatible strategy named Spaced KD to improve the effectiveness of both online KD and self KD, in which the student model distills knowledge from a teacher model trained with a space interval ahead. This strategy is inspired by a prominent theory named spacing effect in the field of biological learning and memory, positing that appropriate intervals between learning trials can significantly enhance learning performance. We provide an in-depth theoretical and empirical analysis showing that the benefits of the proposed spacing effect in KD stem from seeking a flat minima during stochastic gradient descent (SGD). We perform extensive experiments to demonstrate the effectiveness of our Spaced KD in improving the learning performance of DNNs (e.g., the additional performance gain is up to 2.31% and 3.34% on Tiny-ImageNet over online KD and self KD, respectively). Our codes have been released on github https://github.com/SunGL001/Spaced-KD.} }
Endnote
%0 Conference Paper %T Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation %A Guanglong Sun %A Hongwei Yan %A Liyuan Wang %A Qian Li %A Bo Lei %A Yi Zhong %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-sun25w %I PMLR %P 57790--57807 %U https://proceedings.mlr.press/v267/sun25w.html %V 267 %X Knowledge distillation (KD) is a powerful strategy for training deep neural networks (DNNs). While it was originally proposed to train a more compact “student” model from a large “teacher” model, many recent efforts have focused on adapting it as an effective way to promote generalization of the model itself, such as online KD and self KD. Here, we propose an easy-to-use and compatible strategy named Spaced KD to improve the effectiveness of both online KD and self KD, in which the student model distills knowledge from a teacher model trained with a space interval ahead. This strategy is inspired by a prominent theory named spacing effect in the field of biological learning and memory, positing that appropriate intervals between learning trials can significantly enhance learning performance. We provide an in-depth theoretical and empirical analysis showing that the benefits of the proposed spacing effect in KD stem from seeking a flat minima during stochastic gradient descent (SGD). We perform extensive experiments to demonstrate the effectiveness of our Spaced KD in improving the learning performance of DNNs (e.g., the additional performance gain is up to 2.31% and 3.34% on Tiny-ImageNet over online KD and self KD, respectively). Our codes have been released on github https://github.com/SunGL001/Spaced-KD.
APA
Sun, G., Yan, H., Wang, L., Li, Q., Lei, B. & Zhong, Y.. (2025). Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:57790-57807 Available from https://proceedings.mlr.press/v267/sun25w.html.

Related Material