A Calibration Metric for Risk Scores with Survival Data

Steve Yadlowsky; Sanjay Basu; Lu Tian

A Calibration Metric for Risk Scores with Survival Data

Steve Yadlowsky, Sanjay Basu, Lu Tian

Proceedings of the 4th Machine Learning for Healthcare Conference, PMLR 106:424-450, 2019.

Abstract

We study methods for assessing the degree of systematic over- or under- estimation, known as calibration, of a learned risk model in an independent validation cohort. Here, we advance methods for evaluating clinical risk prediction models by deriving a population parameter measuring the average calibration error of the predicted risk from the true risk, and providing a method for estimation and inference. Our approach improves upon commonly-used goodness of fit tests that depends on subjective bin thresholding and may yield misleading results by reporting confidence intervals for the calibration error instead of a simple P-value that conflate calibration error and sample size. This approach enables comparison among multiple risk prediction models, and can guide model revision. We illustrate how our new method helps to understand the calibration of risk models that have been profoundly influential in clinical practice, but controversial due to their potential miscalibration.

Cite this Paper

BibTeX

@InProceedings{pmlr-v106-yadlowsky19a,
  title = 	 {A Calibration Metric for Risk Scores with Survival Data},
  author =       {Yadlowsky, Steve and Basu, Sanjay and Tian, Lu},
  booktitle = 	 {Proceedings of the 4th Machine Learning for Healthcare Conference},
  pages = 	 {424--450},
  year = 	 {2019},
  editor = 	 {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna},
  volume = 	 {106},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--10 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v106/yadlowsky19a/yadlowsky19a.pdf},
  url = 	 {https://proceedings.mlr.press/v106/yadlowsky19a.html},
  abstract = 	 {We study methods for assessing the degree of systematic over- or under- estimation, known as calibration, of a learned risk model in an independent validation cohort. Here, we advance methods for evaluating clinical risk prediction models by deriving a population parameter measuring the average calibration error of the predicted risk from the true risk, and providing a method for estimation and inference. Our approach improves upon commonly-used goodness of fit tests that depends on subjective bin thresholding and may yield misleading results by reporting confidence intervals for the calibration error instead of a simple P-value that conflate calibration error and sample size. This approach enables comparison among multiple risk prediction models, and can guide model revision. We illustrate how our new method helps to understand the calibration of risk models that have been profoundly influential in clinical practice, but controversial due to their potential miscalibration.}
}

Endnote

%0 Conference Paper
%T A Calibration Metric for Risk Scores with Survival Data
%A Steve Yadlowsky
%A Sanjay Basu
%A Lu Tian
%B Proceedings of the 4th Machine Learning for Healthcare Conference
%C Proceedings of Machine Learning Research
%D 2019
%E Finale Doshi-Velez
%E Jim Fackler
%E Ken Jung
%E David Kale
%E Rajesh Ranganath
%E Byron Wallace
%E Jenna Wiens	
%F pmlr-v106-yadlowsky19a
%I PMLR
%P 424--450
%U https://proceedings.mlr.press/v106/yadlowsky19a.html
%V 106
%X We study methods for assessing the degree of systematic over- or under- estimation, known as calibration, of a learned risk model in an independent validation cohort. Here, we advance methods for evaluating clinical risk prediction models by deriving a population parameter measuring the average calibration error of the predicted risk from the true risk, and providing a method for estimation and inference. Our approach improves upon commonly-used goodness of fit tests that depends on subjective bin thresholding and may yield misleading results by reporting confidence intervals for the calibration error instead of a simple P-value that conflate calibration error and sample size. This approach enables comparison among multiple risk prediction models, and can guide model revision. We illustrate how our new method helps to understand the calibration of risk models that have been profoundly influential in clinical practice, but controversial due to their potential miscalibration.

APA

Yadlowsky, S., Basu, S. & Tian, L.. (2019). A Calibration Metric for Risk Scores with Survival Data. Proceedings of the 4th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 106:424-450 Available from https://proceedings.mlr.press/v106/yadlowsky19a.html.

Related Material

Download PDF