Joint learning of Gaussian graphical models in heterogeneous dependencies of high-dimensional transcriptomic data

Dung Ngoc Nguyen, Zitong Li
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:511-526, 2025.

Abstract

In biology, the construction of a gene co-expression network is a difficult research problem, due to the high dimensionality of the data and the heterogeneity of the samples. Furthermore, observations from two or more groups sharing the same biological variables require the comparison of gene co-expression patterns with some commonalities between the groups. In this context, we propose a mixture of Gaussian graphical models for paired data to estimate heterogeneous dependencies and recover sub-population networks from these complicated biological data with certain sparsity and symmetry constraints of two groups of dependent variables. We develop an efficient generalized expectation-maximization (EM) algorithm for penalized maximum likelihood estimation with the fusion of a graphical lasso penalty. As a result, we demonstrate the numerical performance of our method in simulation studies, with the new method outperforming the classical graphical lasso approach in terms of model fitting. A real-world application for estimating gene networks on a high dimensional ecological transcriptomics data set of nine-spined stickleback has also been provided. Our new approach identified similarities and differences between groups of genes from the brain and liver tissues of samples collected from two habitats. These results show the efficiency of our approach to the identification of complicated interactions from high-dimensional and heterogeneous gene expression data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-nguyen25b, title = {Joint learning of Gaussian graphical models in heterogeneous dependencies of high-dimensional transcriptomic data}, author = {Nguyen, Dung Ngoc and Li, Zitong}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {511--526}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/nguyen25b/nguyen25b.pdf}, url = {https://proceedings.mlr.press/v260/nguyen25b.html}, abstract = {In biology, the construction of a gene co-expression network is a difficult research problem, due to the high dimensionality of the data and the heterogeneity of the samples. Furthermore, observations from two or more groups sharing the same biological variables require the comparison of gene co-expression patterns with some commonalities between the groups. In this context, we propose a mixture of Gaussian graphical models for paired data to estimate heterogeneous dependencies and recover sub-population networks from these complicated biological data with certain sparsity and symmetry constraints of two groups of dependent variables. We develop an efficient generalized expectation-maximization (EM) algorithm for penalized maximum likelihood estimation with the fusion of a graphical lasso penalty. As a result, we demonstrate the numerical performance of our method in simulation studies, with the new method outperforming the classical graphical lasso approach in terms of model fitting. A real-world application for estimating gene networks on a high dimensional ecological transcriptomics data set of nine-spined stickleback has also been provided. Our new approach identified similarities and differences between groups of genes from the brain and liver tissues of samples collected from two habitats. These results show the efficiency of our approach to the identification of complicated interactions from high-dimensional and heterogeneous gene expression data.} }
Endnote
%0 Conference Paper %T Joint learning of Gaussian graphical models in heterogeneous dependencies of high-dimensional transcriptomic data %A Dung Ngoc Nguyen %A Zitong Li %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-nguyen25b %I PMLR %P 511--526 %U https://proceedings.mlr.press/v260/nguyen25b.html %V 260 %X In biology, the construction of a gene co-expression network is a difficult research problem, due to the high dimensionality of the data and the heterogeneity of the samples. Furthermore, observations from two or more groups sharing the same biological variables require the comparison of gene co-expression patterns with some commonalities between the groups. In this context, we propose a mixture of Gaussian graphical models for paired data to estimate heterogeneous dependencies and recover sub-population networks from these complicated biological data with certain sparsity and symmetry constraints of two groups of dependent variables. We develop an efficient generalized expectation-maximization (EM) algorithm for penalized maximum likelihood estimation with the fusion of a graphical lasso penalty. As a result, we demonstrate the numerical performance of our method in simulation studies, with the new method outperforming the classical graphical lasso approach in terms of model fitting. A real-world application for estimating gene networks on a high dimensional ecological transcriptomics data set of nine-spined stickleback has also been provided. Our new approach identified similarities and differences between groups of genes from the brain and liver tissues of samples collected from two habitats. These results show the efficiency of our approach to the identification of complicated interactions from high-dimensional and heterogeneous gene expression data.
APA
Nguyen, D.N. & Li, Z.. (2025). Joint learning of Gaussian graphical models in heterogeneous dependencies of high-dimensional transcriptomic data. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:511-526 Available from https://proceedings.mlr.press/v260/nguyen25b.html.

Related Material