Harnessing DNA Foundation Models for Cross-Species Transcription Factor Binding Site Prediction in Plant Genomes

Maryam Haghani, Krishna vamsi Dhulipalla, Song Li
Proceedings of the 20th Machine Learning in Computational Biology meeting, PMLR 311:189-198, 2025.

Abstract

Accurate prediction of transcription factor binding sites (TFBSs) is crucial for understanding gene regulation. While experimental methods like ChIP-seq and DAP-seq are informative, they are labor-intensive and species-specific. Recent advancements in large-scale pretrained DNA foundation models have shown promise in overcoming these limitations. This study evaluates the performance of three such models—DNABERT-2, AgroNT, and HyenaDNA—in predicting TFBSs in plants. Using DAP-seq data from Arabidopsis thaliana and Sisymbrium irio, we benchmark their accuracy against specialized approaches, including a motif-based method and two deep learning models, DeepBind and BERT-TFBS. Our results demonstrate that foundation models, particularly HyenaDNA, offer superior predictive accuracy and computational efficiency, highlighting their potential for scalable, genome-wide TFBS prediction in plants.

Cite this Paper


BibTeX
@InProceedings{pmlr-v311-haghani25a, title = {Harnessing DNA Foundation Models for Cross-Species Transcription Factor Binding Site Prediction in Plant Genomes}, author = {Haghani, Maryam and Dhulipalla, Krishna vamsi and Li, Song}, booktitle = {Proceedings of the 20th Machine Learning in Computational Biology meeting}, pages = {189--198}, year = {2025}, editor = {Knowles, David A and Koo, Peter K}, volume = {311}, series = {Proceedings of Machine Learning Research}, month = {10--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v311/main/assets/haghani25a/haghani25a.pdf}, url = {https://proceedings.mlr.press/v311/haghani25a.html}, abstract = {Accurate prediction of transcription factor binding sites (TFBSs) is crucial for understanding gene regulation. While experimental methods like ChIP-seq and DAP-seq are informative, they are labor-intensive and species-specific. Recent advancements in large-scale pretrained DNA foundation models have shown promise in overcoming these limitations. This study evaluates the performance of three such models—DNABERT-2, AgroNT, and HyenaDNA—in predicting TFBSs in plants. Using DAP-seq data from Arabidopsis thaliana and Sisymbrium irio, we benchmark their accuracy against specialized approaches, including a motif-based method and two deep learning models, DeepBind and BERT-TFBS. Our results demonstrate that foundation models, particularly HyenaDNA, offer superior predictive accuracy and computational efficiency, highlighting their potential for scalable, genome-wide TFBS prediction in plants.} }
Endnote
%0 Conference Paper %T Harnessing DNA Foundation Models for Cross-Species Transcription Factor Binding Site Prediction in Plant Genomes %A Maryam Haghani %A Krishna vamsi Dhulipalla %A Song Li %B Proceedings of the 20th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2025 %E David A Knowles %E Peter K Koo %F pmlr-v311-haghani25a %I PMLR %P 189--198 %U https://proceedings.mlr.press/v311/haghani25a.html %V 311 %X Accurate prediction of transcription factor binding sites (TFBSs) is crucial for understanding gene regulation. While experimental methods like ChIP-seq and DAP-seq are informative, they are labor-intensive and species-specific. Recent advancements in large-scale pretrained DNA foundation models have shown promise in overcoming these limitations. This study evaluates the performance of three such models—DNABERT-2, AgroNT, and HyenaDNA—in predicting TFBSs in plants. Using DAP-seq data from Arabidopsis thaliana and Sisymbrium irio, we benchmark their accuracy against specialized approaches, including a motif-based method and two deep learning models, DeepBind and BERT-TFBS. Our results demonstrate that foundation models, particularly HyenaDNA, offer superior predictive accuracy and computational efficiency, highlighting their potential for scalable, genome-wide TFBS prediction in plants.
APA
Haghani, M., Dhulipalla, K.v. & Li, S.. (2025). Harnessing DNA Foundation Models for Cross-Species Transcription Factor Binding Site Prediction in Plant Genomes. Proceedings of the 20th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 311:189-198 Available from https://proceedings.mlr.press/v311/haghani25a.html.

Related Material