Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models

Zheng Gong; Ying Sun

Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models

Zheng Gong, Ying Sun

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19996-20015, 2025.

Abstract

Discrete Graph Diffusion Models (DGDMs) mark a pivotal advancement in graph generation, effectively preserving sparsity and structural integrity, thereby enhancing the learning of graph data distributions for diverse generative applications. Despite their potential, DGDMs are computationally intensive due to the numerous low-parameter yet high-computation operations, thereby increasing the need of inference acceleration. A promising solution to mitigate this issue is model quantization. However, existing quantization techniques for Image Diffusion Models (IDMs) face limitations in DGDMs due to differing diffusion processes, while Large Language Model (LLM) quantization focuses on reducing memory access latency of loading large parameters, unlike DGDMs, where inference bottlenecks are computations due to smaller model sizes. To fill this gap, we introduce Bit-DGDM, a post-training quantization framework for DGDMs which incorporates two novel ideas: (i) sparse-dense activation quantization sparsely modeling the activation outliers through adaptively selected, data-free thresholds in full-precision and quantizing the remaining to low-bit, and (ii) ill-conditioned low-rank decomposition decomposing the weights into low-rank component enable faster inference and an $\alpha$-sparsity matrix that models outliers. Extensive experiments demonstrate that Bit-DGDM not only reducing the memory usage from the FP32 baseline by up to $2.8\times$ and achieve up to $2.5\times$ speedup, but also achieve comparable performance to ultra-low precision of up to 4-bit.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-gong25e,
  title = 	 {Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models},
  author =       {Gong, Zheng and Sun, Ying},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {19996--20015},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gong25e/gong25e.pdf},
  url = 	 {https://proceedings.mlr.press/v267/gong25e.html},
  abstract = 	 {Discrete Graph Diffusion Models (DGDMs) mark a pivotal advancement in graph generation, effectively preserving sparsity and structural integrity, thereby enhancing the learning of graph data distributions for diverse generative applications. Despite their potential, DGDMs are computationally intensive due to the numerous low-parameter yet high-computation operations, thereby increasing the need of inference acceleration. A promising solution to mitigate this issue is model quantization. However, existing quantization techniques for Image Diffusion Models (IDMs) face limitations in DGDMs due to differing diffusion processes, while Large Language Model (LLM) quantization focuses on reducing memory access latency of loading large parameters, unlike DGDMs, where inference bottlenecks are computations due to smaller model sizes. To fill this gap, we introduce Bit-DGDM, a post-training quantization framework for DGDMs which incorporates two novel ideas: (i) sparse-dense activation quantization sparsely modeling the activation outliers through adaptively selected, data-free thresholds in full-precision and quantizing the remaining to low-bit, and (ii) ill-conditioned low-rank decomposition decomposing the weights into low-rank component enable faster inference and an $\alpha$-sparsity matrix that models outliers. Extensive experiments demonstrate that Bit-DGDM not only reducing the memory usage from the FP32 baseline by up to $2.8\times$ and achieve up to $2.5\times$ speedup, but also achieve comparable performance to ultra-low precision of up to 4-bit.}
}

Endnote

%0 Conference Paper
%T Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models
%A Zheng Gong
%A Ying Sun
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-gong25e
%I PMLR
%P 19996--20015
%U https://proceedings.mlr.press/v267/gong25e.html
%V 267
%X Discrete Graph Diffusion Models (DGDMs) mark a pivotal advancement in graph generation, effectively preserving sparsity and structural integrity, thereby enhancing the learning of graph data distributions for diverse generative applications. Despite their potential, DGDMs are computationally intensive due to the numerous low-parameter yet high-computation operations, thereby increasing the need of inference acceleration. A promising solution to mitigate this issue is model quantization. However, existing quantization techniques for Image Diffusion Models (IDMs) face limitations in DGDMs due to differing diffusion processes, while Large Language Model (LLM) quantization focuses on reducing memory access latency of loading large parameters, unlike DGDMs, where inference bottlenecks are computations due to smaller model sizes. To fill this gap, we introduce Bit-DGDM, a post-training quantization framework for DGDMs which incorporates two novel ideas: (i) sparse-dense activation quantization sparsely modeling the activation outliers through adaptively selected, data-free thresholds in full-precision and quantizing the remaining to low-bit, and (ii) ill-conditioned low-rank decomposition decomposing the weights into low-rank component enable faster inference and an $\alpha$-sparsity matrix that models outliers. Extensive experiments demonstrate that Bit-DGDM not only reducing the memory usage from the FP32 baseline by up to $2.8\times$ and achieve up to $2.5\times$ speedup, but also achieve comparable performance to ultra-low precision of up to 4-bit.

APA

Gong, Z. & Sun, Y.. (2025). Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:19996-20015 Available from https://proceedings.mlr.press/v267/gong25e.html.

Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models

Abstract

Cite this Paper

Related Material