Layer-wise Quantization for Quantized Optimistic Dual Averaging

Anh Duc Nguyen; Ilia Markov; Zhengqing Wu; Ali Ramezani-Kebrya; Kimon Antonakopoulos; Dan Alistarh; Volkan Cevher

Layer-wise Quantization for Quantized Optimistic Dual Averaging

Anh Duc Nguyen, Ilia Markov, Zhengqing Wu, Ali Ramezani-Kebrya, Kimon Antonakopoulos, Dan Alistarh, Volkan Cevher

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:46026-46072, 2025.

Abstract

Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-nguyen25d,
  title = 	 {Layer-wise Quantization for Quantized Optimistic Dual Averaging},
  author =       {Nguyen, Anh Duc and Markov, Ilia and Wu, Zhengqing and Ramezani-Kebrya, Ali and Antonakopoulos, Kimon and Alistarh, Dan and Cevher, Volkan},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {46026--46072},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/nguyen25d/nguyen25d.pdf},
  url = 	 {https://proceedings.mlr.press/v267/nguyen25d.html},
  abstract = 	 {Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.}
}

Endnote

%0 Conference Paper
%T Layer-wise Quantization for Quantized Optimistic Dual Averaging
%A Anh Duc Nguyen
%A Ilia Markov
%A Zhengqing Wu
%A Ali Ramezani-Kebrya
%A Kimon Antonakopoulos
%A Dan Alistarh
%A Volkan Cevher
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-nguyen25d
%I PMLR
%P 46026--46072
%U https://proceedings.mlr.press/v267/nguyen25d.html
%V 267
%X Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.

APA

Nguyen, A.D., Markov, I., Wu, Z., Ramezani-Kebrya, A., Antonakopoulos, K., Alistarh, D. & Cevher, V.. (2025). Layer-wise Quantization for Quantized Optimistic Dual Averaging. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:46026-46072 Available from https://proceedings.mlr.press/v267/nguyen25d.html.

Layer-wise Quantization for Quantized Optimistic Dual Averaging

Abstract

Cite this Paper

Related Material