[edit]
Federated Variational Inference for Bayesian Mixture Models
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:826-863, 2026.
Abstract
We present a one-shot, unsupervised federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets, motivated by the need to identify patient clusters in privacy-sensitive electronic health record ({EHR}) data. We introduce a principled ‘divide-and-conquer’ inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by global merge moves across batches to find global clustering structures. We show that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that our method performs well relative to comparator clustering algorithms. We validate the practical utility of the method by applying it to a large-scale British primary care {EHR} dataset to identify clusters of individuals with common patterns of co-occurring conditions (multimorbidity).