[edit]
Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19996-20015, 2025.
Abstract
Discrete Graph Diffusion Models (DGDMs) mark a pivotal advancement in graph generation, effectively preserving sparsity and structural integrity, thereby enhancing the learning of graph data distributions for diverse generative applications. Despite their potential, DGDMs are computationally intensive due to the numerous low-parameter yet high-computation operations, thereby increasing the need of inference acceleration. A promising solution to mitigate this issue is model quantization. However, existing quantization techniques for Image Diffusion Models (IDMs) face limitations in DGDMs due to differing diffusion processes, while Large Language Model (LLM) quantization focuses on reducing memory access latency of loading large parameters, unlike DGDMs, where inference bottlenecks are computations due to smaller model sizes. To fill this gap, we introduce Bit-DGDM, a post-training quantization framework for DGDMs which incorporates two novel ideas: (i) sparse-dense activation quantization sparsely modeling the activation outliers through adaptively selected, data-free thresholds in full-precision and quantizing the remaining to low-bit, and (ii) ill-conditioned low-rank decomposition decomposing the weights into low-rank component enable faster inference and an $\alpha$-sparsity matrix that models outliers. Extensive experiments demonstrate that Bit-DGDM not only reducing the memory usage from the FP32 baseline by up to $2.8\times$ and achieve up to $2.5\times$ speedup, but also achieve comparable performance to ultra-low precision of up to 4-bit.