Classifying Lung Cancer Severity with Ensemble Machine Learning in Health Care Claims Data


Savannah L. Bergquist, Gabriel A. Brooks, Nancy L. Keating, Mary Beth Landrum, Sherri Rose ;
Proceedings of the 2nd Machine Learning for Healthcare Conference, PMLR 68:25-38, 2017.


Research in oncology quality of care and health outcomes has been limited by the difficulty of identifying cancer stage in health care claims data. Using linked cancer registry and Medicare claims data, we develop a tool for classifying lung cancer patients receiving chemotherapy into early vs. late stage cancer by (i) deploying ensemble machine learning for prediction, (ii) establishing a set of classification rules for the predicted probabilities, and (iii) considering an augmented set of administrative claims data. We find our ensemble machine learning algorithm with a classification rule defined by the median substantially outperforms an existing clinical decision tree for this problem, yielding full sample performance of 93% sensitivity, 92% specificity, and 93% accuracy. This work has the potential for broad applicability as provider organizations, payers, and policy makers seek to measure quality and outcomes of cancer care and improve on risk adjustment methods.

Related Material