Bayesian Variable Selection in a Million Dimensions

Martin Jankowiak
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:253-282, 2023.

Abstract

Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient Markov Chain Monte Carlo scheme whose cost per iteration is sublinear in P (though linear in the number of data points). In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-jankowiak23a, title = {Bayesian Variable Selection in a Million Dimensions}, author = {Jankowiak, Martin}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {253--282}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/jankowiak23a/jankowiak23a.pdf}, url = {https://proceedings.mlr.press/v206/jankowiak23a.html}, abstract = {Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient Markov Chain Monte Carlo scheme whose cost per iteration is sublinear in P (though linear in the number of data points). In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.} }
Endnote
%0 Conference Paper %T Bayesian Variable Selection in a Million Dimensions %A Martin Jankowiak %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-jankowiak23a %I PMLR %P 253--282 %U https://proceedings.mlr.press/v206/jankowiak23a.html %V 206 %X Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient Markov Chain Monte Carlo scheme whose cost per iteration is sublinear in P (though linear in the number of data points). In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.
APA
Jankowiak, M.. (2023). Bayesian Variable Selection in a Million Dimensions. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:253-282 Available from https://proceedings.mlr.press/v206/jankowiak23a.html.

Related Material