LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, Tamara Broderick
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6315-6324, 2019.

Abstract

Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational–statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-trippe19a, title = {{LR}-{GLM}: High-Dimensional {B}ayesian Inference Using Low-Rank Data Approximations}, author = {Trippe, Brian and Huggins, Jonathan and Agrawal, Raj and Broderick, Tamara}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {6315--6324}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/trippe19a/trippe19a.pdf}, url = {https://proceedings.mlr.press/v97/trippe19a.html}, abstract = {Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational–statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.} }
Endnote
%0 Conference Paper %T LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations %A Brian Trippe %A Jonathan Huggins %A Raj Agrawal %A Tamara Broderick %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-trippe19a %I PMLR %P 6315--6324 %U https://proceedings.mlr.press/v97/trippe19a.html %V 97 %X Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational–statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.
APA
Trippe, B., Huggins, J., Agrawal, R. & Broderick, T.. (2019). LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6315-6324 Available from https://proceedings.mlr.press/v97/trippe19a.html.

Related Material