InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference

Tianyu Cui, Song-Jun Xu, Artem Moskalev, Shuwei Li, Tommaso Mansi, Mangal Prakash, Rui Liao
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:11604-11619, 2025.

Abstract

Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases—such as class imbalances of GT interactions—rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-cui25e, title = {{I}nfo{SEM}: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference}, author = {Cui, Tianyu and Xu, Song-Jun and Moskalev, Artem and Li, Shuwei and Mansi, Tommaso and Prakash, Mangal and Liao, Rui}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {11604--11619}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cui25e/cui25e.pdf}, url = {https://proceedings.mlr.press/v267/cui25e.html}, abstract = {Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases—such as class imbalances of GT interactions—rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.} }
Endnote
%0 Conference Paper %T InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference %A Tianyu Cui %A Song-Jun Xu %A Artem Moskalev %A Shuwei Li %A Tommaso Mansi %A Mangal Prakash %A Rui Liao %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-cui25e %I PMLR %P 11604--11619 %U https://proceedings.mlr.press/v267/cui25e.html %V 267 %X Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases—such as class imbalances of GT interactions—rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.
APA
Cui, T., Xu, S., Moskalev, A., Li, S., Mansi, T., Prakash, M. & Liao, R.. (2025). InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:11604-11619 Available from https://proceedings.mlr.press/v267/cui25e.html.

Related Material