SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model

Zhao Yang, Jiwei Zhu, Bing Su
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:71648-71667, 2025.

Abstract

Inspired by the success of unsupervised pre-training paradigms, researchers have applied these approaches to DNA pre-training. However, we argue that these approaches alone yield suboptimal results because pure DNA sequences lack sufficient information, since their functions are regulated by genomic profiles like chromatin accessibility. Here, we demonstrate that supervised training for genomic profile prediction serves as a more effective alternative to pure sequence pre-training. Furthermore, considering the multi-species and multi-profile nature of genomic profile prediction, we introduce our Species-Profile Adaptive Collaborative Experts (SPACE) that leverages Mixture of Experts (MoE) to better capture the relationships between DNA sequences across different species and genomic profiles, thereby learning more effective DNA representations. Through extensive experiments across various tasks, our model achieves state-of-the-art performance, establishing that DNA models trained with supervised genomic profiles serve as powerful DNA representation learners.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yang25ay, title = {{SPACE}: Your Genomic Profile Predictor is a Powerful {DNA} Foundation Model}, author = {Yang, Zhao and Zhu, Jiwei and Su, Bing}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {71648--71667}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yang25ay/yang25ay.pdf}, url = {https://proceedings.mlr.press/v267/yang25ay.html}, abstract = {Inspired by the success of unsupervised pre-training paradigms, researchers have applied these approaches to DNA pre-training. However, we argue that these approaches alone yield suboptimal results because pure DNA sequences lack sufficient information, since their functions are regulated by genomic profiles like chromatin accessibility. Here, we demonstrate that supervised training for genomic profile prediction serves as a more effective alternative to pure sequence pre-training. Furthermore, considering the multi-species and multi-profile nature of genomic profile prediction, we introduce our Species-Profile Adaptive Collaborative Experts (SPACE) that leverages Mixture of Experts (MoE) to better capture the relationships between DNA sequences across different species and genomic profiles, thereby learning more effective DNA representations. Through extensive experiments across various tasks, our model achieves state-of-the-art performance, establishing that DNA models trained with supervised genomic profiles serve as powerful DNA representation learners.} }
Endnote
%0 Conference Paper %T SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model %A Zhao Yang %A Jiwei Zhu %A Bing Su %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yang25ay %I PMLR %P 71648--71667 %U https://proceedings.mlr.press/v267/yang25ay.html %V 267 %X Inspired by the success of unsupervised pre-training paradigms, researchers have applied these approaches to DNA pre-training. However, we argue that these approaches alone yield suboptimal results because pure DNA sequences lack sufficient information, since their functions are regulated by genomic profiles like chromatin accessibility. Here, we demonstrate that supervised training for genomic profile prediction serves as a more effective alternative to pure sequence pre-training. Furthermore, considering the multi-species and multi-profile nature of genomic profile prediction, we introduce our Species-Profile Adaptive Collaborative Experts (SPACE) that leverages Mixture of Experts (MoE) to better capture the relationships between DNA sequences across different species and genomic profiles, thereby learning more effective DNA representations. Through extensive experiments across various tasks, our model achieves state-of-the-art performance, establishing that DNA models trained with supervised genomic profiles serve as powerful DNA representation learners.
APA
Yang, Z., Zhu, J. & Su, B.. (2025). SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:71648-71667 Available from https://proceedings.mlr.press/v267/yang25ay.html.

Related Material