Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

Kevin Kögler; Aleksandr Shevchenko; Hamed Hassani; Marco Mondelli

Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

Kevin Kögler, Aleksandr Shevchenko, Hamed Hassani, Marco Mondelli

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:24964-25015, 2024.

Abstract

Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source – with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-kogler24a,
  title = 	 {Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth},
  author =       {K\"{o}gler, Kevin and Shevchenko, Aleksandr and Hassani, Hamed and Mondelli, Marco},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {24964--25015},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kogler24a/kogler24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/kogler24a.html},
  abstract = 	 {Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source – with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.}
}

Endnote

%0 Conference Paper
%T Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
%A Kevin Kögler
%A Aleksandr Shevchenko
%A Hamed Hassani
%A Marco Mondelli
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-kogler24a
%I PMLR
%P 24964--25015
%U https://proceedings.mlr.press/v235/kogler24a.html
%V 235
%X Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source – with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.

APA


Kögler, K., Shevchenko, A., Hassani, H. & Mondelli, M.. (2024). Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:24964-25015 Available from https://proceedings.mlr.press/v235/kogler24a.html.

Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

Abstract

Cite this Paper

Related Material