Optimizing Bayesian Neural Networks for Genomic Prediction: A Study on Feature Selection and Architecture

Raeein Bagheri, Yan Yan, Justin Slater
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:187-198, 2026.

Abstract

Genome wide association studies (GWAS) scan the genome for genetic variants, typically single nucleotide polymorphisms, whose alleles are associated with phenotypic variation across individuals. GWAS and genomic prediction face a core challenge: learning from extremely high dimensional genotype matrices under limited sample sizes. Bayesian neural networks offer uncertainty aware prediction and the capacity to represent nonlinear genetic effects, but their practical performance depends on feature selection and architectural choices that interact with the inference mechanism. This paper presents an empirical study that improves a Bayesian neural network pipeline for genomic prediction by tuning input selection strategies, network depth and width, and activation functions under Hamiltonian Monte Carlo inference. We compare three approaches: a deterministic ResNet baseline, a standard “out of the box” Bayesian neural network, and an optimized Bayesian neural network produced through targeted tuning. Results show that feature selection is necessary for stable learning under the large $p$, small $n$ regime and that smooth activations are primary drivers of improved posterior exploration and predictive accuracy. On an Ear Height benchmark from the TASSEL tutorial ecosystem, the optimized BNN achieves a test $R^2$ near $0.68$, outperforming the standard BNN and the deterministic baseline.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-bagheri26a, title = {Optimizing Bayesian Neural Networks for Genomic Prediction: A Study on Feature Selection and Architecture}, author = {Bagheri, Raeein and Yan, Yan and Slater, Justin}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {187--198}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/bagheri26a/bagheri26a.pdf}, url = {https://proceedings.mlr.press/v318/bagheri26a.html}, abstract = {Genome wide association studies (GWAS) scan the genome for genetic variants, typically single nucleotide polymorphisms, whose alleles are associated with phenotypic variation across individuals. GWAS and genomic prediction face a core challenge: learning from extremely high dimensional genotype matrices under limited sample sizes. Bayesian neural networks offer uncertainty aware prediction and the capacity to represent nonlinear genetic effects, but their practical performance depends on feature selection and architectural choices that interact with the inference mechanism. This paper presents an empirical study that improves a Bayesian neural network pipeline for genomic prediction by tuning input selection strategies, network depth and width, and activation functions under Hamiltonian Monte Carlo inference. We compare three approaches: a deterministic ResNet baseline, a standard “out of the box” Bayesian neural network, and an optimized Bayesian neural network produced through targeted tuning. Results show that feature selection is necessary for stable learning under the large $p$, small $n$ regime and that smooth activations are primary drivers of improved posterior exploration and predictive accuracy. On an Ear Height benchmark from the TASSEL tutorial ecosystem, the optimized BNN achieves a test $R^2$ near $0.68$, outperforming the standard BNN and the deterministic baseline.} }
Endnote
%0 Conference Paper %T Optimizing Bayesian Neural Networks for Genomic Prediction: A Study on Feature Selection and Architecture %A Raeein Bagheri %A Yan Yan %A Justin Slater %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-bagheri26a %I PMLR %P 187--198 %U https://proceedings.mlr.press/v318/bagheri26a.html %V 318 %X Genome wide association studies (GWAS) scan the genome for genetic variants, typically single nucleotide polymorphisms, whose alleles are associated with phenotypic variation across individuals. GWAS and genomic prediction face a core challenge: learning from extremely high dimensional genotype matrices under limited sample sizes. Bayesian neural networks offer uncertainty aware prediction and the capacity to represent nonlinear genetic effects, but their practical performance depends on feature selection and architectural choices that interact with the inference mechanism. This paper presents an empirical study that improves a Bayesian neural network pipeline for genomic prediction by tuning input selection strategies, network depth and width, and activation functions under Hamiltonian Monte Carlo inference. We compare three approaches: a deterministic ResNet baseline, a standard “out of the box” Bayesian neural network, and an optimized Bayesian neural network produced through targeted tuning. Results show that feature selection is necessary for stable learning under the large $p$, small $n$ regime and that smooth activations are primary drivers of improved posterior exploration and predictive accuracy. On an Ear Height benchmark from the TASSEL tutorial ecosystem, the optimized BNN achieves a test $R^2$ near $0.68$, outperforming the standard BNN and the deterministic baseline.
APA
Bagheri, R., Yan, Y. & Slater, J.. (2026). Optimizing Bayesian Neural Networks for Genomic Prediction: A Study on Feature Selection and Architecture. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:187-198 Available from https://proceedings.mlr.press/v318/bagheri26a.html.

Related Material