Why predicting risk can’t identify ‘risk factors’: empirical assessment of model stability in machine learning across observational health databases

Aniek F. Markus, Peter R. Rijnbeek, Jenna M. Reps
Proceedings of the 7th Machine Learning for Healthcare Conference, PMLR 182:828-852, 2022.

Abstract

People often interpret clinical prediction models to detect ‘risk factors’, i.e. to identify variables associated to the outcome. We shed light on the stability of prediction models by performing a large-scale experiment developing over 450 prediction models using LASSO logistic regression and investigating model changes across databases (care settings) and phenotype definitions. Our results show that model stability, as measured by the similarity of selected variables, is poor across the prediction tasks but slightly better for the top (i.e. most important) variables. Differences in the top variables are mostly due to database choice and not due to using different target population and/or outcome phenotype definitions. However, this means using a different database might lead to finding different ‘risk factors’. Furthermore, we found the effect (i.e. sign) of variables is not always the same across models, which makes clinical interpretation of potential ‘risk factors’ difficult. This study shows it is important to be careful when using LASSO regression to identify ‘risk factors’ and not to over-interpret the developed models in general. For ‘risk factor’ detection, we recommend investigating model robustness across settings or using alternative methods (e.g. univariate analysis).

Cite this Paper


BibTeX
@InProceedings{pmlr-v182-markus22a, title = {Why predicting risk can’t identify ‘risk factors’: empirical assessment of model stability in machine learning across observational health databases}, author = {Markus, Aniek F. and Rijnbeek, Peter R. and Reps, Jenna M.}, booktitle = {Proceedings of the 7th Machine Learning for Healthcare Conference}, pages = {828--852}, year = {2022}, editor = {Lipton, Zachary and Ranganath, Rajesh and Sendak, Mark and Sjoding, Michael and Yeung, Serena}, volume = {182}, series = {Proceedings of Machine Learning Research}, month = {05--06 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v182/markus22a/markus22a.pdf}, url = {https://proceedings.mlr.press/v182/markus22a.html}, abstract = {People often interpret clinical prediction models to detect ‘risk factors’, i.e. to identify variables associated to the outcome. We shed light on the stability of prediction models by performing a large-scale experiment developing over 450 prediction models using LASSO logistic regression and investigating model changes across databases (care settings) and phenotype definitions. Our results show that model stability, as measured by the similarity of selected variables, is poor across the prediction tasks but slightly better for the top (i.e. most important) variables. Differences in the top variables are mostly due to database choice and not due to using different target population and/or outcome phenotype definitions. However, this means using a different database might lead to finding different ‘risk factors’. Furthermore, we found the effect (i.e. sign) of variables is not always the same across models, which makes clinical interpretation of potential ‘risk factors’ difficult. This study shows it is important to be careful when using LASSO regression to identify ‘risk factors’ and not to over-interpret the developed models in general. For ‘risk factor’ detection, we recommend investigating model robustness across settings or using alternative methods (e.g. univariate analysis).} }
Endnote
%0 Conference Paper %T Why predicting risk can’t identify ‘risk factors’: empirical assessment of model stability in machine learning across observational health databases %A Aniek F. Markus %A Peter R. Rijnbeek %A Jenna M. Reps %B Proceedings of the 7th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2022 %E Zachary Lipton %E Rajesh Ranganath %E Mark Sendak %E Michael Sjoding %E Serena Yeung %F pmlr-v182-markus22a %I PMLR %P 828--852 %U https://proceedings.mlr.press/v182/markus22a.html %V 182 %X People often interpret clinical prediction models to detect ‘risk factors’, i.e. to identify variables associated to the outcome. We shed light on the stability of prediction models by performing a large-scale experiment developing over 450 prediction models using LASSO logistic regression and investigating model changes across databases (care settings) and phenotype definitions. Our results show that model stability, as measured by the similarity of selected variables, is poor across the prediction tasks but slightly better for the top (i.e. most important) variables. Differences in the top variables are mostly due to database choice and not due to using different target population and/or outcome phenotype definitions. However, this means using a different database might lead to finding different ‘risk factors’. Furthermore, we found the effect (i.e. sign) of variables is not always the same across models, which makes clinical interpretation of potential ‘risk factors’ difficult. This study shows it is important to be careful when using LASSO regression to identify ‘risk factors’ and not to over-interpret the developed models in general. For ‘risk factor’ detection, we recommend investigating model robustness across settings or using alternative methods (e.g. univariate analysis).
APA
Markus, A.F., Rijnbeek, P.R. & Reps, J.M.. (2022). Why predicting risk can’t identify ‘risk factors’: empirical assessment of model stability in machine learning across observational health databases. Proceedings of the 7th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 182:828-852 Available from https://proceedings.mlr.press/v182/markus22a.html.

Related Material