Context-Dependent Genetic Modifiers of Huntington’s Disease Revealed Through Multimodal Machine Learning

Jordi Abante, Caterina Fuses
Proceedings of the 20th Machine Learning in Computational Biology meeting, PMLR 311:128-147, 2025.

Abstract

Huntington’s disease (HD) exhibits substantial variability in age of onset (AO), only partially explained by the length of the CAG repeat in the HTT gene. While most studies seeking additional genetic modifiers (GeMs) have relied on linear models, we investigate the potential of non-linear machine learning (ML) approaches, such as tree-based models and graph neural networks (GNNs), to capture complex, context-dependent genetic interactions influencing AO. To address the challenges posed by high-dimensional genotyping data, we introduce a strategy based on gene-specific variational autoencoders for genotype compression. This framework reveals novel modifiers with effects dependent on CAG repeat length, underscoring the importance of accounting for feature interactions. Additionally, we integrate predicted gene expression levels from Borzoi— a genomic language model—into a multimodal prediction architecture. This integration allows us to identify regulatory variants likely to affect AO through expression changes. To our knowledge, this is the first application of a gLM in multimodal genotype-to-phenotype prediction, offering a new paradigm for interpretable modeling of complex traits in HD and related polyglutamine disorders.

Cite this Paper


BibTeX
@InProceedings{pmlr-v311-abante25a, title = {Context-Dependent Genetic Modifiers of Huntington’s Disease Revealed Through Multimodal Machine Learning}, author = {Abante, Jordi and Fuses, Caterina}, booktitle = {Proceedings of the 20th Machine Learning in Computational Biology meeting}, pages = {128--147}, year = {2025}, editor = {Knowles, David A and Koo, Peter K}, volume = {311}, series = {Proceedings of Machine Learning Research}, month = {10--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v311/main/assets/abante25a/abante25a.pdf}, url = {https://proceedings.mlr.press/v311/abante25a.html}, abstract = {Huntington’s disease (HD) exhibits substantial variability in age of onset (AO), only partially explained by the length of the CAG repeat in the HTT gene. While most studies seeking additional genetic modifiers (GeMs) have relied on linear models, we investigate the potential of non-linear machine learning (ML) approaches, such as tree-based models and graph neural networks (GNNs), to capture complex, context-dependent genetic interactions influencing AO. To address the challenges posed by high-dimensional genotyping data, we introduce a strategy based on gene-specific variational autoencoders for genotype compression. This framework reveals novel modifiers with effects dependent on CAG repeat length, underscoring the importance of accounting for feature interactions. Additionally, we integrate predicted gene expression levels from Borzoi— a genomic language model—into a multimodal prediction architecture. This integration allows us to identify regulatory variants likely to affect AO through expression changes. To our knowledge, this is the first application of a gLM in multimodal genotype-to-phenotype prediction, offering a new paradigm for interpretable modeling of complex traits in HD and related polyglutamine disorders.} }
Endnote
%0 Conference Paper %T Context-Dependent Genetic Modifiers of Huntington’s Disease Revealed Through Multimodal Machine Learning %A Jordi Abante %A Caterina Fuses %B Proceedings of the 20th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2025 %E David A Knowles %E Peter K Koo %F pmlr-v311-abante25a %I PMLR %P 128--147 %U https://proceedings.mlr.press/v311/abante25a.html %V 311 %X Huntington’s disease (HD) exhibits substantial variability in age of onset (AO), only partially explained by the length of the CAG repeat in the HTT gene. While most studies seeking additional genetic modifiers (GeMs) have relied on linear models, we investigate the potential of non-linear machine learning (ML) approaches, such as tree-based models and graph neural networks (GNNs), to capture complex, context-dependent genetic interactions influencing AO. To address the challenges posed by high-dimensional genotyping data, we introduce a strategy based on gene-specific variational autoencoders for genotype compression. This framework reveals novel modifiers with effects dependent on CAG repeat length, underscoring the importance of accounting for feature interactions. Additionally, we integrate predicted gene expression levels from Borzoi— a genomic language model—into a multimodal prediction architecture. This integration allows us to identify regulatory variants likely to affect AO through expression changes. To our knowledge, this is the first application of a gLM in multimodal genotype-to-phenotype prediction, offering a new paradigm for interpretable modeling of complex traits in HD and related polyglutamine disorders.
APA
Abante, J. & Fuses, C.. (2025). Context-Dependent Genetic Modifiers of Huntington’s Disease Revealed Through Multimodal Machine Learning. Proceedings of the 20th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 311:128-147 Available from https://proceedings.mlr.press/v311/abante25a.html.

Related Material