Identifiable Phenotyping using Constrained Non-Negative Matrix Factorization
Proceedings of the 1st Machine Learning for Healthcare Conference, PMLR 56:17-41, 2016.
This work proposes a new algorithm for automated and simultaneous phenotyping of multiple co-occurring medical conditions, also referred to as comorbidities, using clinical notes from electronic health records (EHRs). A latent factor estimation technique, non-negative matrix factorization (NMF), is augmented with domain constraints from weak supervision to obtain sparse latent factors that are grounded to a fixed set of chronic conditions. The proposed grounding mechanism ensures a one-to-one identifiable and interpretable mapping between the latent factors and the target comorbidities. Qualitative assessment of the empirical results by clinical experts show that the proposed model learns clinically interpretable phenotypes which are also shown to have competitive performance on 30 day mortality prediction task. The proposed method can be readily adapted to any non-negative EHR data across various healthcare institutions.