[edit]
Optimizing Bayesian Neural Networks for Genomic Prediction: A Study on Feature Selection and Architecture
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:187-198, 2026.
Abstract
Genome wide association studies (GWAS) scan the genome for genetic variants, typically single nucleotide polymorphisms, whose alleles are associated with phenotypic variation across individuals. GWAS and genomic prediction face a core challenge: learning from extremely high dimensional genotype matrices under limited sample sizes. Bayesian neural networks offer uncertainty aware prediction and the capacity to represent nonlinear genetic effects, but their practical performance depends on feature selection and architectural choices that interact with the inference mechanism. This paper presents an empirical study that improves a Bayesian neural network pipeline for genomic prediction by tuning input selection strategies, network depth and width, and activation functions under Hamiltonian Monte Carlo inference. We compare three approaches: a deterministic ResNet baseline, a standard “out of the box” Bayesian neural network, and an optimized Bayesian neural network produced through targeted tuning. Results show that feature selection is necessary for stable learning under the large $p$, small $n$ regime and that smooth activations are primary drivers of improved posterior exploration and predictive accuracy. On an Ear Height benchmark from the TASSEL tutorial ecosystem, the optimized BNN achieves a test $R^2$ near $0.68$, outperforming the standard BNN and the deterministic baseline.