Layer-wise Quantization for Quantized Optimistic Dual Averaging

Anh Duc Nguyen, Ilia Markov, Zhengqing Wu, Ali Ramezani-Kebrya, Kimon Antonakopoulos, Dan Alistarh, Volkan Cevher
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:46026-46072, 2025.

Abstract

Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-nguyen25d, title = {Layer-wise Quantization for Quantized Optimistic Dual Averaging}, author = {Nguyen, Anh Duc and Markov, Ilia and Wu, Zhengqing and Ramezani-Kebrya, Ali and Antonakopoulos, Kimon and Alistarh, Dan and Cevher, Volkan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {46026--46072}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/nguyen25d/nguyen25d.pdf}, url = {https://proceedings.mlr.press/v267/nguyen25d.html}, abstract = {Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.} }
Endnote
%0 Conference Paper %T Layer-wise Quantization for Quantized Optimistic Dual Averaging %A Anh Duc Nguyen %A Ilia Markov %A Zhengqing Wu %A Ali Ramezani-Kebrya %A Kimon Antonakopoulos %A Dan Alistarh %A Volkan Cevher %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-nguyen25d %I PMLR %P 46026--46072 %U https://proceedings.mlr.press/v267/nguyen25d.html %V 267 %X Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.
APA
Nguyen, A.D., Markov, I., Wu, Z., Ramezani-Kebrya, A., Antonakopoulos, K., Alistarh, D. & Cevher, V.. (2025). Layer-wise Quantization for Quantized Optimistic Dual Averaging. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:46026-46072 Available from https://proceedings.mlr.press/v267/nguyen25d.html.

Related Material