AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Zhao Jin, Dacheng Tao
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:57694-57711, 2025.

Abstract

Diffusion Transformers (DiTs) have proven effective in generating high-quality videos but are hindered by high computational costs. Existing video diffusion sampling acceleration methods often rely on costly fine-tuning or exhibit limited generalization capabilities. We propose Asymmetric Reduction and Restoration (AsymRnR), a training-free and model-agnostic method to accelerate video DiTs. It builds on the observation that redundancies of feature tokens in DiTs vary significantly across different model blocks, denoising steps, and feature types. Our AsymRnR asymmetrically reduces redundant tokens in the attention operation, achieving acceleration with negligible degradation in output quality and, in some cases, even improving it. We also tailored a reduction schedule to distribute the reduction across components adaptively. To further accelerate this process, we introduce a matching cache for more efficient reduction. Backed by theoretical foundations and extensive experimental validation, AsymRnR integrates into state-of-the-art video DiTs and offers substantial speedup.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-sun25r, title = {{A}sym{R}n{R}: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration}, author = {Sun, Wenhao and Tu, Rong-Cheng and Liao, Jingyi and Jin, Zhao and Tao, Dacheng}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {57694--57711}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/sun25r/sun25r.pdf}, url = {https://proceedings.mlr.press/v267/sun25r.html}, abstract = {Diffusion Transformers (DiTs) have proven effective in generating high-quality videos but are hindered by high computational costs. Existing video diffusion sampling acceleration methods often rely on costly fine-tuning or exhibit limited generalization capabilities. We propose Asymmetric Reduction and Restoration (AsymRnR), a training-free and model-agnostic method to accelerate video DiTs. It builds on the observation that redundancies of feature tokens in DiTs vary significantly across different model blocks, denoising steps, and feature types. Our AsymRnR asymmetrically reduces redundant tokens in the attention operation, achieving acceleration with negligible degradation in output quality and, in some cases, even improving it. We also tailored a reduction schedule to distribute the reduction across components adaptively. To further accelerate this process, we introduce a matching cache for more efficient reduction. Backed by theoretical foundations and extensive experimental validation, AsymRnR integrates into state-of-the-art video DiTs and offers substantial speedup.} }
Endnote
%0 Conference Paper %T AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration %A Wenhao Sun %A Rong-Cheng Tu %A Jingyi Liao %A Zhao Jin %A Dacheng Tao %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-sun25r %I PMLR %P 57694--57711 %U https://proceedings.mlr.press/v267/sun25r.html %V 267 %X Diffusion Transformers (DiTs) have proven effective in generating high-quality videos but are hindered by high computational costs. Existing video diffusion sampling acceleration methods often rely on costly fine-tuning or exhibit limited generalization capabilities. We propose Asymmetric Reduction and Restoration (AsymRnR), a training-free and model-agnostic method to accelerate video DiTs. It builds on the observation that redundancies of feature tokens in DiTs vary significantly across different model blocks, denoising steps, and feature types. Our AsymRnR asymmetrically reduces redundant tokens in the attention operation, achieving acceleration with negligible degradation in output quality and, in some cases, even improving it. We also tailored a reduction schedule to distribute the reduction across components adaptively. To further accelerate this process, we introduce a matching cache for more efficient reduction. Backed by theoretical foundations and extensive experimental validation, AsymRnR integrates into state-of-the-art video DiTs and offers substantial speedup.
APA
Sun, W., Tu, R., Liao, J., Jin, Z. & Tao, D.. (2025). AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:57694-57711 Available from https://proceedings.mlr.press/v267/sun25r.html.

Related Material