[edit]
An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions
Proceedings of the 5th Machine Learning for Healthcare Conference, PMLR 126:46-79, 2020.
Abstract
Applying machine learning in healthcare can be problematic because predictions
might be biased, can lack robustness, and are prone to overly rely on correlations.
Interpretable machine learning can mitigate these issues by visualizing gaps in
problem formalization and putting the responsibility to meet additional desiderata
of machine learning systems on human practitioners. Generalized additive models
with interactions are transparent, with modular one- and two-dimensional risk functions
that can be reviewed and, if necessary, removed. The key objective of this study
is to determine whether these models can be interpreted by doctors to safely deploy
them in a clinical setting. To this end, we simulated the review process of eight
risk functions trained on a clinical task with twelve clinicians and collected information
about objective and subjective factors of interpretability. The ratio of correct
answers for dichotomous statements covering important properties of risk functions
was $0.83\pm 0.02$ (n = 360) and the median of the participants’ certainty to correctly
understand them was Certain ($n = 96$) on a seven-level Likert scale (one = Very Uncertain
to seven = Very Certain). These results suggest that doctors can correctly interpret
risk functions of generalized additive models with interactions and also feel confident
to do so. However, the evaluation also identified several interpretability issues
and it showed that interpretability of generalized additive models depends on the
complexity of risk functions.