[edit]
Joint learning of Gaussian graphical models in heterogeneous dependencies of high-dimensional transcriptomic data
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:511-526, 2025.
Abstract
In biology, the construction of a gene co-expression network is a difficult research problem, due to the high dimensionality of the data and the heterogeneity of the samples. Furthermore, observations from two or more groups sharing the same biological variables require the comparison of gene co-expression patterns with some commonalities between the groups. In this context, we propose a mixture of Gaussian graphical models for paired data to estimate heterogeneous dependencies and recover sub-population networks from these complicated biological data with certain sparsity and symmetry constraints of two groups of dependent variables. We develop an efficient generalized expectation-maximization (EM) algorithm for penalized maximum likelihood estimation with the fusion of a graphical lasso penalty. As a result, we demonstrate the numerical performance of our method in simulation studies, with the new method outperforming the classical graphical lasso approach in terms of model fitting. A real-world application for estimating gene networks on a high dimensional ecological transcriptomics data set of nine-spined stickleback has also been provided. Our new approach identified similarities and differences between groups of genes from the brain and liver tissues of samples collected from two habitats. These results show the efficiency of our approach to the identification of complicated interactions from high-dimensional and heterogeneous gene expression data.