Be a Goldfish: Forgetting Bad Conditioning in Sparse Linear Regression via Variational Autoencoders

Kuheli Pratihar, Debdeep Mukhopadhyay
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:49783-49802, 2025.

Abstract

Variational Autoencoders (VAEs), a class of latent-variable generative models, have seen extensive use in high-fidelity synthesis tasks, yet their loss landscape remains poorly understood. Prior theoretical works on VAE loss analysis have focused on their latent-space representational capabilities, both in the optimal and limiting cases. Although these insights have guided better VAE designs, they also often restrict VAEs to problem settings where classical algorithms, such as Principal Component Analysis (PCA), can trivially guarantee globally optimal solutions. In this work, we push the boundaries of our understanding of VAEs beyond these traditional regimes to tackle NP-hard sparse inverse problems, for which no classical algorithms exist. Specifically, we examine the nontrivial Sparse Linear Regression (SLR) problem of recovering optimal sparse inputs in the presence of an ill-conditioned design matrix having correlated features. We provably show that, under a linear encoder-decoder architecture incorporating the product of the SLR design matrix with a trainable, sparsity-promoting diagonal matrix, any minimum of VAE loss is guaranteed to be an optimal solution. This property is especially useful for identifying (a) a preconditioning factor that reduces the eigenvalue spread, and (b) the corresponding optimal sparse representation. Lastly, our empirical analysis with different types of design matrices validates these findings and even demonstrates a higher recovery rate at low sparsity where traditional algorithms fail. Overall, this work highlights the flexible nature of the VAE loss, which can be adapted to efficiently solve computationally hard problems under specific constraints.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-pratihar25a, title = {Be a Goldfish: Forgetting Bad Conditioning in Sparse Linear Regression via Variational Autoencoders}, author = {Pratihar, Kuheli and Mukhopadhyay, Debdeep}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {49783--49802}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/pratihar25a/pratihar25a.pdf}, url = {https://proceedings.mlr.press/v267/pratihar25a.html}, abstract = {Variational Autoencoders (VAEs), a class of latent-variable generative models, have seen extensive use in high-fidelity synthesis tasks, yet their loss landscape remains poorly understood. Prior theoretical works on VAE loss analysis have focused on their latent-space representational capabilities, both in the optimal and limiting cases. Although these insights have guided better VAE designs, they also often restrict VAEs to problem settings where classical algorithms, such as Principal Component Analysis (PCA), can trivially guarantee globally optimal solutions. In this work, we push the boundaries of our understanding of VAEs beyond these traditional regimes to tackle NP-hard sparse inverse problems, for which no classical algorithms exist. Specifically, we examine the nontrivial Sparse Linear Regression (SLR) problem of recovering optimal sparse inputs in the presence of an ill-conditioned design matrix having correlated features. We provably show that, under a linear encoder-decoder architecture incorporating the product of the SLR design matrix with a trainable, sparsity-promoting diagonal matrix, any minimum of VAE loss is guaranteed to be an optimal solution. This property is especially useful for identifying (a) a preconditioning factor that reduces the eigenvalue spread, and (b) the corresponding optimal sparse representation. Lastly, our empirical analysis with different types of design matrices validates these findings and even demonstrates a higher recovery rate at low sparsity where traditional algorithms fail. Overall, this work highlights the flexible nature of the VAE loss, which can be adapted to efficiently solve computationally hard problems under specific constraints.} }
Endnote
%0 Conference Paper %T Be a Goldfish: Forgetting Bad Conditioning in Sparse Linear Regression via Variational Autoencoders %A Kuheli Pratihar %A Debdeep Mukhopadhyay %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-pratihar25a %I PMLR %P 49783--49802 %U https://proceedings.mlr.press/v267/pratihar25a.html %V 267 %X Variational Autoencoders (VAEs), a class of latent-variable generative models, have seen extensive use in high-fidelity synthesis tasks, yet their loss landscape remains poorly understood. Prior theoretical works on VAE loss analysis have focused on their latent-space representational capabilities, both in the optimal and limiting cases. Although these insights have guided better VAE designs, they also often restrict VAEs to problem settings where classical algorithms, such as Principal Component Analysis (PCA), can trivially guarantee globally optimal solutions. In this work, we push the boundaries of our understanding of VAEs beyond these traditional regimes to tackle NP-hard sparse inverse problems, for which no classical algorithms exist. Specifically, we examine the nontrivial Sparse Linear Regression (SLR) problem of recovering optimal sparse inputs in the presence of an ill-conditioned design matrix having correlated features. We provably show that, under a linear encoder-decoder architecture incorporating the product of the SLR design matrix with a trainable, sparsity-promoting diagonal matrix, any minimum of VAE loss is guaranteed to be an optimal solution. This property is especially useful for identifying (a) a preconditioning factor that reduces the eigenvalue spread, and (b) the corresponding optimal sparse representation. Lastly, our empirical analysis with different types of design matrices validates these findings and even demonstrates a higher recovery rate at low sparsity where traditional algorithms fail. Overall, this work highlights the flexible nature of the VAE loss, which can be adapted to efficiently solve computationally hard problems under specific constraints.
APA
Pratihar, K. & Mukhopadhyay, D.. (2025). Be a Goldfish: Forgetting Bad Conditioning in Sparse Linear Regression via Variational Autoencoders. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:49783-49802 Available from https://proceedings.mlr.press/v267/pratihar25a.html.

Related Material