Federated Variational Inference for Bayesian Mixture Models

Jackie Rao, Francesca L. Crowe, Tom Marshall, Sylvia Richardson, Paul D. W. Kirk
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:826-863, 2026.

Abstract

We present a one-shot, unsupervised federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets, motivated by the need to identify patient clusters in privacy-sensitive electronic health record ({EHR}) data. We introduce a principled ‘divide-and-conquer’ inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by global merge moves across batches to find global clustering structures. We show that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that our method performs well relative to comparator clustering algorithms. We validate the practical utility of the method by applying it to a large-scale British primary care {EHR} dataset to identify clusters of individuals with common patterns of co-occurring conditions (multimorbidity).

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-rao26a, title = {Federated Variational Inference for Bayesian Mixture Models}, author = {Rao, Jackie and Crowe, Francesca L. and Marshall, Tom and Richardson, Sylvia and Kirk, Paul D. W.}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {826--863}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/rao26a/rao26a.pdf}, url = {https://proceedings.mlr.press/v297/rao26a.html}, abstract = {We present a one-shot, unsupervised federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets, motivated by the need to identify patient clusters in privacy-sensitive electronic health record ({EHR}) data. We introduce a principled ‘divide-and-conquer’ inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by global merge moves across batches to find global clustering structures. We show that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that our method performs well relative to comparator clustering algorithms. We validate the practical utility of the method by applying it to a large-scale British primary care {EHR} dataset to identify clusters of individuals with common patterns of co-occurring conditions (multimorbidity).} }
Endnote
%0 Conference Paper %T Federated Variational Inference for Bayesian Mixture Models %A Jackie Rao %A Francesca L. Crowe %A Tom Marshall %A Sylvia Richardson %A Paul D. W. Kirk %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-rao26a %I PMLR %P 826--863 %U https://proceedings.mlr.press/v297/rao26a.html %V 297 %X We present a one-shot, unsupervised federated learning approach for Bayesian model-based clustering of large-scale binary and categorical datasets, motivated by the need to identify patient clusters in privacy-sensitive electronic health record ({EHR}) data. We introduce a principled ‘divide-and-conquer’ inference procedure using variational inference with local merge and delete moves within batches of the data in parallel, followed by global merge moves across batches to find global clustering structures. We show that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that our method performs well relative to comparator clustering algorithms. We validate the practical utility of the method by applying it to a large-scale British primary care {EHR} dataset to identify clusters of individuals with common patterns of co-occurring conditions (multimorbidity).
APA
Rao, J., Crowe, F.L., Marshall, T., Richardson, S. & Kirk, P.D.W.. (2026). Federated Variational Inference for Bayesian Mixture Models. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:826-863 Available from https://proceedings.mlr.press/v297/rao26a.html.

Related Material