An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions

Stefan Hegselmann; Thomas Volkert; Hendrik Ohlenburg; Antje Gottschalk; Martin Dugas; Christian Ertmer

An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions

Stefan Hegselmann, Thomas Volkert, Hendrik Ohlenburg, Antje Gottschalk, Martin Dugas, Christian Ertmer

Proceedings of the 5th Machine Learning for Healthcare Conference, PMLR 126:46-79, 2020.

Abstract

Applying machine learning in healthcare can be problematic because predictions might be biased, can lack robustness, and are prone to overly rely on correlations. Interpretable machine learning can mitigate these issues by visualizing gaps in problem formalization and putting the responsibility to meet additional desiderata of machine learning systems on human practitioners. Generalized additive models with interactions are transparent, with modular one- and two-dimensional risk functions that can be reviewed and, if necessary, removed. The key objective of this study is to determine whether these models can be interpreted by doctors to safely deploy them in a clinical setting. To this end, we simulated the review process of eight risk functions trained on a clinical task with twelve clinicians and collected information about objective and subjective factors of interpretability. The ratio of correct answers for dichotomous statements covering important properties of risk functions was $0.83\pm 0.02$ (n = 360) and the median of the participants’ certainty to correctly understand them was Certain ($n = 96$) on a seven-level Likert scale (one = Very Uncertain to seven = Very Certain). These results suggest that doctors can correctly interpret risk functions of generalized additive models with interactions and also feel confident to do so. However, the evaluation also identified several interpretability issues and it showed that interpretability of generalized additive models depends on the complexity of risk functions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v126-hegselmann20a,
  title = 	 {An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions},
  author =       {Hegselmann, Stefan and Volkert, Thomas and Ohlenburg, Hendrik and Gottschalk, Antje and Dugas, Martin and Ertmer, Christian},
  booktitle = 	 {Proceedings of the 5th Machine Learning for Healthcare Conference},
  pages = 	 {46--79},
  year = 	 {2020},
  editor = 	 {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna},
  volume = 	 {126},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--08 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v126/hegselmann20a/hegselmann20a.pdf},
  url = 	 {https://proceedings.mlr.press/v126/hegselmann20a.html},
  abstract = 	 {Applying machine learning in healthcare can be problematic because predictions
might be biased, can lack robustness, and are prone to overly rely on correlations.
Interpretable machine learning can mitigate these issues by visualizing gaps in
problem formalization and putting the responsibility to meet additional desiderata
of machine learning systems on human practitioners. Generalized additive models
with interactions are transparent, with modular one- and two-dimensional risk functions
that can be reviewed and, if necessary, removed. The key objective of this study
is to determine whether these models can be interpreted by doctors to safely deploy
them in a clinical setting. To this end, we simulated the review process of eight
risk functions trained on a clinical task with twelve clinicians and collected information
about objective and subjective factors of interpretability. The ratio of correct
answers for dichotomous statements covering important properties of risk functions
was $0.83\pm 0.02$ (n = 360) and the median of the participants’ certainty to correctly
understand them was Certain ($n = 96$) on a seven-level Likert scale (one = Very Uncertain
to seven = Very Certain). These results suggest that doctors can correctly interpret
risk functions of generalized additive models with interactions and also feel confident
to do so. However, the evaluation also identified several interpretability issues
and it showed that interpretability of generalized additive models depends on the
complexity of risk functions.
}
}

Endnote

%0 Conference Paper
%T An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions
%A Stefan Hegselmann
%A Thomas Volkert
%A Hendrik Ohlenburg
%A Antje Gottschalk
%A Martin Dugas
%A Christian Ertmer
%B Proceedings of the 5th Machine Learning for Healthcare Conference
%C Proceedings of Machine Learning Research
%D 2020
%E Finale Doshi-Velez
%E Jim Fackler
%E Ken Jung
%E David Kale
%E Rajesh Ranganath
%E Byron Wallace
%E Jenna Wiens	
%F pmlr-v126-hegselmann20a
%I PMLR
%P 46--79
%U https://proceedings.mlr.press/v126/hegselmann20a.html
%V 126
%X Applying machine learning in healthcare can be problematic because predictions
might be biased, can lack robustness, and are prone to overly rely on correlations.
Interpretable machine learning can mitigate these issues by visualizing gaps in
problem formalization and putting the responsibility to meet additional desiderata
of machine learning systems on human practitioners. Generalized additive models
with interactions are transparent, with modular one- and two-dimensional risk functions
that can be reviewed and, if necessary, removed. The key objective of this study
is to determine whether these models can be interpreted by doctors to safely deploy
them in a clinical setting. To this end, we simulated the review process of eight
risk functions trained on a clinical task with twelve clinicians and collected information
about objective and subjective factors of interpretability. The ratio of correct
answers for dichotomous statements covering important properties of risk functions
was $0.83\pm 0.02$ (n = 360) and the median of the participants’ certainty to correctly
understand them was Certain ($n = 96$) on a seven-level Likert scale (one = Very Uncertain
to seven = Very Certain). These results suggest that doctors can correctly interpret
risk functions of generalized additive models with interactions and also feel confident
to do so. However, the evaluation also identified several interpretability issues
and it showed that interpretability of generalized additive models depends on the
complexity of risk functions.

APA


Hegselmann, S., Volkert, T., Ohlenburg, H., Gottschalk, A., Dugas, M. & Ertmer, C.. (2020). An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions. Proceedings of the 5th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 126:46-79 Available from https://proceedings.mlr.press/v126/hegselmann20a.html.

Related Material

Download PDF