Classifying Lung Cancer Severity with Ensemble Machine Learning in Health Care Claims Data

Savannah L. Bergquist, Gabriel A. Brooks, Nancy L. Keating, Mary Beth Landrum, Sherri Rose
Proceedings of the 2nd Machine Learning for Healthcare Conference, PMLR 68:25-38, 2017.

Abstract

Research in oncology quality of care and health outcomes has been limited by the difficulty of identifying cancer stage in health care claims data. Using linked cancer registry and Medicare claims data, we develop a tool for classifying lung cancer patients receiving chemotherapy into early vs. late stage cancer by (i) deploying ensemble machine learning for prediction, (ii) establishing a set of classification rules for the predicted probabilities, and (iii) considering an augmented set of administrative claims data. We find our ensemble machine learning algorithm with a classification rule defined by the median substantially outperforms an existing clinical decision tree for this problem, yielding full sample performance of 93% sensitivity, 92% specificity, and 93% accuracy. This work has the potential for broad applicability as provider organizations, payers, and policy makers seek to measure quality and outcomes of cancer care and improve on risk adjustment methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v68-bergquist17a, title = {Classifying Lung Cancer Severity with Ensemble Machine Learning in Health Care Claims Data}, author = {Bergquist, Savannah L. and Brooks, Gabriel A. and Keating, Nancy L. and Landrum, Mary Beth and Rose, Sherri}, booktitle = {Proceedings of the 2nd Machine Learning for Healthcare Conference}, pages = {25--38}, year = {2017}, editor = {Doshi-Velez, Finale and Fackler, Jim and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna}, volume = {68}, series = {Proceedings of Machine Learning Research}, month = {18--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v68/bergquist17a/bergquist17a.pdf}, url = {https://proceedings.mlr.press/v68/bergquist17a.html}, abstract = {Research in oncology quality of care and health outcomes has been limited by the difficulty of identifying cancer stage in health care claims data. Using linked cancer registry and Medicare claims data, we develop a tool for classifying lung cancer patients receiving chemotherapy into early vs. late stage cancer by (i) deploying ensemble machine learning for prediction, (ii) establishing a set of classification rules for the predicted probabilities, and (iii) considering an augmented set of administrative claims data. We find our ensemble machine learning algorithm with a classification rule defined by the median substantially outperforms an existing clinical decision tree for this problem, yielding full sample performance of 93% sensitivity, 92% specificity, and 93% accuracy. This work has the potential for broad applicability as provider organizations, payers, and policy makers seek to measure quality and outcomes of cancer care and improve on risk adjustment methods.} }
Endnote
%0 Conference Paper %T Classifying Lung Cancer Severity with Ensemble Machine Learning in Health Care Claims Data %A Savannah L. Bergquist %A Gabriel A. Brooks %A Nancy L. Keating %A Mary Beth Landrum %A Sherri Rose %B Proceedings of the 2nd Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2017 %E Finale Doshi-Velez %E Jim Fackler %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v68-bergquist17a %I PMLR %P 25--38 %U https://proceedings.mlr.press/v68/bergquist17a.html %V 68 %X Research in oncology quality of care and health outcomes has been limited by the difficulty of identifying cancer stage in health care claims data. Using linked cancer registry and Medicare claims data, we develop a tool for classifying lung cancer patients receiving chemotherapy into early vs. late stage cancer by (i) deploying ensemble machine learning for prediction, (ii) establishing a set of classification rules for the predicted probabilities, and (iii) considering an augmented set of administrative claims data. We find our ensemble machine learning algorithm with a classification rule defined by the median substantially outperforms an existing clinical decision tree for this problem, yielding full sample performance of 93% sensitivity, 92% specificity, and 93% accuracy. This work has the potential for broad applicability as provider organizations, payers, and policy makers seek to measure quality and outcomes of cancer care and improve on risk adjustment methods.
APA
Bergquist, S.L., Brooks, G.A., Keating, N.L., Landrum, M.B. & Rose, S.. (2017). Classifying Lung Cancer Severity with Ensemble Machine Learning in Health Care Claims Data. Proceedings of the 2nd Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 68:25-38 Available from https://proceedings.mlr.press/v68/bergquist17a.html.

Related Material