Price of metric universality in vector quantization is at most 0.11 bit

Alina Harbuzova; Or Ordentlich; Yury Polyanskiy

Price of metric universality in vector quantization is at most 0.11 bit

Alina Harbuzova, Or Ordentlich, Yury Polyanskiy

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:3143-3183, 2026.

Abstract

Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (“weight-only quantization”). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as “waterfilling allocation”). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-harbuzova26a,
  title = 	 {Price of metric universality in vector quantization is at most 0.11 bit},
  author =       {Harbuzova, Alina and Ordentlich, Or and Polyanskiy, Yury},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {3143--3183},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/harbuzova26a/harbuzova26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/harbuzova26a.html},
  abstract = 	 {Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (“weight-only quantization”). Information theory demonstrates that an optimal algorithm  for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as “waterfilling allocation”). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms. }
}

Endnote

%0 Conference Paper
%T Price of metric universality in vector quantization is at most 0.11 bit
%A Alina Harbuzova
%A Or Ordentlich
%A Yury Polyanskiy
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-harbuzova26a
%I PMLR
%P 3143--3183
%U https://proceedings.mlr.press/v336/harbuzova26a.html
%V 336
%X Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (“weight-only quantization”). Information theory demonstrates that an optimal algorithm  for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as “waterfilling allocation”). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

APA

Harbuzova, A., Ordentlich, O. & Polyanskiy, Y.. (2026). Price of metric universality in vector quantization is at most 0.11 bit. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:3143-3183 Available from https://proceedings.mlr.press/v336/harbuzova26a.html.

Related Material

Download PDF