Manifold-Informed Cohort Discovery (MICD): A Framework for Uncovering Latent Risk Signals in Imbalanced Healthcare Data

Jamell Dacon, Chelsea Minard, Oluwatobi Olajide, Chukwulenyeudo Uwaeme, Chukwuemeka Obasi, Michael Mosuro, Oluwasegun Soji-John, Iyinoluwa Ayodele
Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 317:94-101, 2026.

Abstract

Risk stratification for Coronary Heart Disease (CHD) is fundamentally challenged by severe class imbalance and the structural heterogeneity of the non-diseased patient cohort. Standard classification models, by treating all CHD-negative patients uniformly, fail to detect critical, latent high-risk sub-groups. We introduce the Manifold-Informed Cohort Discovery (MICD) Framework, a novel methodology that systematically integrates clinically-informed feature selection, Manifold Learning (UMAP), and proximity-based clustering to extract these latent risk signals. Our core insight is that individuals with latent high-risk profiles exist in close geometric proximity to true CHD-positive cases within the UMAP-embedded feature space. We validate the framework’s clinical relevance by autonomously isolating a high-risk negative cohort whose feature profile strongly aligns with the established diagnostic markers of Metabolic Syndrome. This alignment proves that our abstract geometric approach encodes a biologically and clinically meaningful pre-disease state. When the insights from this cohort discovery are used in a downstream classification task, the MICD-enhanced model achieves pre-eminent predictive performance (AUROC $\tilde$ 85.1%), significantly outperforming the clinical gold standard (ASCVD Risk Calculator) and state-of-the-art imbalanced learning methods (Focal Loss, SMOTE). Our work establishes a critical, interpretable link between unsupervised data structure and actionable supervised clinical prediction, providing a powerful tool for early, preventative intervention.

Cite this Paper


BibTeX
@InProceedings{pmlr-v317-dacon26b, title = {Manifold-Informed Cohort Discovery (MICD): A Framework for Uncovering Latent Risk Signals in Imbalanced Healthcare Data}, author = {Dacon, Jamell and Minard, Chelsea and Olajide, Oluwatobi and Uwaeme, Chukwulenyeudo and Obasi, Chukwuemeka and Mosuro, Michael and Soji-John, Oluwasegun and Ayodele, Iyinoluwa}, booktitle = {Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare}, pages = {94--101}, year = {2026}, editor = {Wu, Junde and Pan, Jiazhen and Zhu, Jiayuan and Luo, Luyang and Li, Yitong and Xu, Min and Jin, Yueming and Rueckert, Daniel}, volume = {317}, series = {Proceedings of Machine Learning Research}, month = {20--21 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v317/main/assets/dacon26b/dacon26b.pdf}, url = {https://proceedings.mlr.press/v317/dacon26b.html}, abstract = {Risk stratification for Coronary Heart Disease (CHD) is fundamentally challenged by severe class imbalance and the structural heterogeneity of the non-diseased patient cohort. Standard classification models, by treating all CHD-negative patients uniformly, fail to detect critical, latent high-risk sub-groups. We introduce the Manifold-Informed Cohort Discovery (MICD) Framework, a novel methodology that systematically integrates clinically-informed feature selection, Manifold Learning (UMAP), and proximity-based clustering to extract these latent risk signals. Our core insight is that individuals with latent high-risk profiles exist in close geometric proximity to true CHD-positive cases within the UMAP-embedded feature space. We validate the framework’s clinical relevance by autonomously isolating a high-risk negative cohort whose feature profile strongly aligns with the established diagnostic markers of Metabolic Syndrome. This alignment proves that our abstract geometric approach encodes a biologically and clinically meaningful pre-disease state. When the insights from this cohort discovery are used in a downstream classification task, the MICD-enhanced model achieves pre-eminent predictive performance (AUROC $\tilde$ 85.1%), significantly outperforming the clinical gold standard (ASCVD Risk Calculator) and state-of-the-art imbalanced learning methods (Focal Loss, SMOTE). Our work establishes a critical, interpretable link between unsupervised data structure and actionable supervised clinical prediction, providing a powerful tool for early, preventative intervention.} }
Endnote
%0 Conference Paper %T Manifold-Informed Cohort Discovery (MICD): A Framework for Uncovering Latent Risk Signals in Imbalanced Healthcare Data %A Jamell Dacon %A Chelsea Minard %A Oluwatobi Olajide %A Chukwulenyeudo Uwaeme %A Chukwuemeka Obasi %A Michael Mosuro %A Oluwasegun Soji-John %A Iyinoluwa Ayodele %B Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare %C Proceedings of Machine Learning Research %D 2026 %E Junde Wu %E Jiazhen Pan %E Jiayuan Zhu %E Luyang Luo %E Yitong Li %E Min Xu %E Yueming Jin %E Daniel Rueckert %F pmlr-v317-dacon26b %I PMLR %P 94--101 %U https://proceedings.mlr.press/v317/dacon26b.html %V 317 %X Risk stratification for Coronary Heart Disease (CHD) is fundamentally challenged by severe class imbalance and the structural heterogeneity of the non-diseased patient cohort. Standard classification models, by treating all CHD-negative patients uniformly, fail to detect critical, latent high-risk sub-groups. We introduce the Manifold-Informed Cohort Discovery (MICD) Framework, a novel methodology that systematically integrates clinically-informed feature selection, Manifold Learning (UMAP), and proximity-based clustering to extract these latent risk signals. Our core insight is that individuals with latent high-risk profiles exist in close geometric proximity to true CHD-positive cases within the UMAP-embedded feature space. We validate the framework’s clinical relevance by autonomously isolating a high-risk negative cohort whose feature profile strongly aligns with the established diagnostic markers of Metabolic Syndrome. This alignment proves that our abstract geometric approach encodes a biologically and clinically meaningful pre-disease state. When the insights from this cohort discovery are used in a downstream classification task, the MICD-enhanced model achieves pre-eminent predictive performance (AUROC $\tilde$ 85.1%), significantly outperforming the clinical gold standard (ASCVD Risk Calculator) and state-of-the-art imbalanced learning methods (Focal Loss, SMOTE). Our work establishes a critical, interpretable link between unsupervised data structure and actionable supervised clinical prediction, providing a powerful tool for early, preventative intervention.
APA
Dacon, J., Minard, C., Olajide, O., Uwaeme, C., Obasi, C., Mosuro, M., Soji-John, O. & Ayodele, I.. (2026). Manifold-Informed Cohort Discovery (MICD): A Framework for Uncovering Latent Risk Signals in Imbalanced Healthcare Data. Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, in Proceedings of Machine Learning Research 317:94-101 Available from https://proceedings.mlr.press/v317/dacon26b.html.

Related Material