Do Topological Characteristics Help in Knowledge Distillation?

Jungeun Kim, Junwon You, Dongjin Lee, Ha Young Kim, Jae-Hun Jung
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:24674-24693, 2024.

Abstract

Knowledge distillation (KD) aims to transfer knowledge from larger (teacher) to smaller (student) networks. Previous studies focus on point-to-point or pairwise relationships in embedding features as knowledge and struggle to efficiently transfer relationships of complex latent spaces. To tackle this issue, we propose a novel KD method called TopKD, which considers the global topology of the latent spaces. We define global topology knowledge using the persistence diagram (PD) that captures comprehensive geometric structures such as shape of distribution, multiscale structure and connectivity, and the topology distillation loss for teaching this knowledge. To make the PD transferable within reasonable computational time, we employ approximated persistence images of PDs. Through experiments, we support the benefits of using global topology as knowledge and demonstrate the potential of TopKD. Code is available at https://github.com/jekim5418/TopKD

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kim24aj, title = {Do Topological Characteristics Help in Knowledge Distillation?}, author = {Kim, Jungeun and You, Junwon and Lee, Dongjin and Kim, Ha Young and Jung, Jae-Hun}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {24674--24693}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kim24aj/kim24aj.pdf}, url = {https://proceedings.mlr.press/v235/kim24aj.html}, abstract = {Knowledge distillation (KD) aims to transfer knowledge from larger (teacher) to smaller (student) networks. Previous studies focus on point-to-point or pairwise relationships in embedding features as knowledge and struggle to efficiently transfer relationships of complex latent spaces. To tackle this issue, we propose a novel KD method called TopKD, which considers the global topology of the latent spaces. We define global topology knowledge using the persistence diagram (PD) that captures comprehensive geometric structures such as shape of distribution, multiscale structure and connectivity, and the topology distillation loss for teaching this knowledge. To make the PD transferable within reasonable computational time, we employ approximated persistence images of PDs. Through experiments, we support the benefits of using global topology as knowledge and demonstrate the potential of TopKD. Code is available at https://github.com/jekim5418/TopKD} }
Endnote
%0 Conference Paper %T Do Topological Characteristics Help in Knowledge Distillation? %A Jungeun Kim %A Junwon You %A Dongjin Lee %A Ha Young Kim %A Jae-Hun Jung %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kim24aj %I PMLR %P 24674--24693 %U https://proceedings.mlr.press/v235/kim24aj.html %V 235 %X Knowledge distillation (KD) aims to transfer knowledge from larger (teacher) to smaller (student) networks. Previous studies focus on point-to-point or pairwise relationships in embedding features as knowledge and struggle to efficiently transfer relationships of complex latent spaces. To tackle this issue, we propose a novel KD method called TopKD, which considers the global topology of the latent spaces. We define global topology knowledge using the persistence diagram (PD) that captures comprehensive geometric structures such as shape of distribution, multiscale structure and connectivity, and the topology distillation loss for teaching this knowledge. To make the PD transferable within reasonable computational time, we employ approximated persistence images of PDs. Through experiments, we support the benefits of using global topology as knowledge and demonstrate the potential of TopKD. Code is available at https://github.com/jekim5418/TopKD
APA
Kim, J., You, J., Lee, D., Kim, H.Y. & Jung, J.. (2024). Do Topological Characteristics Help in Knowledge Distillation?. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:24674-24693 Available from https://proceedings.mlr.press/v235/kim24aj.html.

Related Material