Learning Latent Variable Models by Pairwise Cluster Comparison
Proceedings of the Asian Conference on Machine Learning, PMLR 25:33-48, 2012.
Identification of latent variables that govern a problem and the relationships among them given measurements in the observed world are important for causal discovery. This identification can be made by analyzing constraints imposed by the latents in the measurements. We introduce the concept of pairwise cluster comparison PCC to identify causal relationships from clusters and a two-stage algorithm, called LPCC, that learns a latent variable model (LVM) using PCC. First, LPCC learns the exogenous and the collider latents, as well as their observed descendants, by utilizing pairwise comparisons between clusters in the measurement space that may explain latent causes. Second, LPCC learns the non-collider endogenous latents and their children by splitting these latents from their previously learned latent ancestors. LPCC is not limited to linear or latent-tree models and does not make assumptions about the distribution. Using simulated and real-world datasets, we show that LPCC improves accuracy with the sample size, can learn large LVMs, and is accurate in learning compared to state-of-the-art algorithms.