Optimizing Trajectory Matching Distillation via Parameter Difference-Driven Pruning

Xinyu Cao, He Liu, Liyuan Zhang, Zhongliang Kan
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:415-430, 2025.

Abstract

Dataset distillation aims to give models trained on synthetic datasets the same performance as models trained with complete real datasets. Trajectory matching distillation, as an efficient dataset distillation method, achieves this goal gradually by accurately matching the dynamic trajectories of the target dataset and the synthetic dataset during the training process. Where the training trajectory is composed of the time series parameters of the agent model, and each time series contains the network parameters of all the layers in the agent model, i.e., trajectory matching distillation achieves its goal by matching the network parameters between the target dataset and the synthetic dataset. However, the variability of the training datasets used by the teacher and student networks can lead to the problem of difficult alignment of network parameters during the distillation process, so this paper proposes Difference-Driven Pruning Distillation (DPD), an innovative approach to pruning the difficult-to-align parameters according to the magnitude of the difference in parameter comparisons to alleviate the above problem. Comparative experimental results show that DPD achieves a significant performance improvement, with a greatly reduced memory footprint and superior performance in several benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-cao25a, title = {Optimizing Trajectory Matching Distillation via Parameter Difference-Driven Pruning}, author = {Cao, Xinyu and Liu, He and Zhang, Liyuan and Kan, Zhongliang}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {415--430}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/cao25a/cao25a.pdf}, url = {https://proceedings.mlr.press/v304/cao25a.html}, abstract = {Dataset distillation aims to give models trained on synthetic datasets the same performance as models trained with complete real datasets. Trajectory matching distillation, as an efficient dataset distillation method, achieves this goal gradually by accurately matching the dynamic trajectories of the target dataset and the synthetic dataset during the training process. Where the training trajectory is composed of the time series parameters of the agent model, and each time series contains the network parameters of all the layers in the agent model, i.e., trajectory matching distillation achieves its goal by matching the network parameters between the target dataset and the synthetic dataset. However, the variability of the training datasets used by the teacher and student networks can lead to the problem of difficult alignment of network parameters during the distillation process, so this paper proposes Difference-Driven Pruning Distillation (DPD), an innovative approach to pruning the difficult-to-align parameters according to the magnitude of the difference in parameter comparisons to alleviate the above problem. Comparative experimental results show that DPD achieves a significant performance improvement, with a greatly reduced memory footprint and superior performance in several benchmarks.} }
Endnote
%0 Conference Paper %T Optimizing Trajectory Matching Distillation via Parameter Difference-Driven Pruning %A Xinyu Cao %A He Liu %A Liyuan Zhang %A Zhongliang Kan %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-cao25a %I PMLR %P 415--430 %U https://proceedings.mlr.press/v304/cao25a.html %V 304 %X Dataset distillation aims to give models trained on synthetic datasets the same performance as models trained with complete real datasets. Trajectory matching distillation, as an efficient dataset distillation method, achieves this goal gradually by accurately matching the dynamic trajectories of the target dataset and the synthetic dataset during the training process. Where the training trajectory is composed of the time series parameters of the agent model, and each time series contains the network parameters of all the layers in the agent model, i.e., trajectory matching distillation achieves its goal by matching the network parameters between the target dataset and the synthetic dataset. However, the variability of the training datasets used by the teacher and student networks can lead to the problem of difficult alignment of network parameters during the distillation process, so this paper proposes Difference-Driven Pruning Distillation (DPD), an innovative approach to pruning the difficult-to-align parameters according to the magnitude of the difference in parameter comparisons to alleviate the above problem. Comparative experimental results show that DPD achieves a significant performance improvement, with a greatly reduced memory footprint and superior performance in several benchmarks.
APA
Cao, X., Liu, H., Zhang, L. & Kan, Z.. (2025). Optimizing Trajectory Matching Distillation via Parameter Difference-Driven Pruning. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:415-430 Available from https://proceedings.mlr.press/v304/cao25a.html.

Related Material