ML4H Auditing: From Paper to Practice

Luis Oala; Jana Fehr; Luca Gilli; Pradeep Balachandran; Alixandro Werneck Leite; Saul Calderon-Ramirez; Danny Xie Li; Gabriel Nobis; Erick Alejandro Muñoz Alvarado; Giovanna Jaramillo-Gutierrez; Christian Matek; Arun Shroff; Ferath Kherif; Bruno Sanguinetti; Thomas Wiegand

ML4H Auditing: From Paper to Practice

Luis Oala, Jana Fehr, Luca Gilli, Pradeep Balachandran, Alixandro Werneck Leite, Saul Calderon-Ramirez, Danny Xie Li, Gabriel Nobis, Erick Alejandro Muñoz Alvarado, Giovanna Jaramillo-Gutierrez, Christian Matek, Arun Shroff, Ferath Kherif, Bruno Sanguinetti, Thomas Wiegand

Proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 136:280-317, 2020.

Abstract

Healthcare systems are currently adapting to digital technologies, producing large quantities of novel data. Based on these data, machine-learning algorithms have been developed to support practitioners in labor-intensive workflows such as diagnosis, prognosis, triage or treatment of disease. However, their translation into medical practice is often hampered by a lack of careful evaluation in different settings. Efforts have started worldwide to establish guidelines for evaluating machine learning for health (ML4H) tools, highlighting the necessity to evaluate models for bias, interpretability, robustness, and possible failure modes. However, testing and adopting these guidelines in practice remains an open challenge. In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics. The assessment comprises dimensions such as bias, interpretability, and robustness. Our results highlight the importance of fine-grained and caseadapted quality assessment, provide support for incorporating proposed quality assessment considerations of ML4H during the entire development life cycle, and suggest improvements for future ML4H reference evaluation frameworks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v136-oala20a,
  title = 	 {ML4H Auditing: From Paper to Practice},
  author =       {Oala, Luis and Fehr, Jana and Gilli, Luca and Balachandran, Pradeep and Leite, Alixandro Werneck and Calderon-Ramirez, Saul and Li, Danny Xie and Nobis, Gabriel and Alvarado, Erick Alejandro Mu\~noz and Jaramillo-Gutierrez, Giovanna and Matek, Christian and Shroff, Arun and Kherif, Ferath and Sanguinetti, Bruno and Wiegand, Thomas},
  booktitle = 	 {Proceedings of the Machine Learning for Health NeurIPS Workshop},
  pages = 	 {280--317},
  year = 	 {2020},
  editor = 	 {Alsentzer, Emily and McDermott, Matthew B. A. and Falck, Fabian and Sarkar, Suproteem K. and Roy, Subhrajit and Hyland, Stephanie L.},
  volume = 	 {136},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v136/oala20a/oala20a.pdf},
  url = 	 {https://proceedings.mlr.press/v136/oala20a.html},
  abstract = 	 {Healthcare systems are currently adapting to digital technologies, producing large quantities of novel data. Based on these data, machine-learning algorithms have been developed to support practitioners in labor-intensive workflows such as diagnosis, prognosis, triage or treatment of disease. However, their translation into medical practice is often hampered by a lack of careful evaluation in different settings. Efforts have started worldwide to establish guidelines for evaluating machine learning for health (ML4H) tools, highlighting the necessity to evaluate models for bias, interpretability, robustness, and possible failure modes. However, testing and adopting these guidelines in practice remains an open challenge. In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics. The assessment comprises dimensions such as bias, interpretability, and robustness. Our results highlight the importance of fine-grained and caseadapted quality assessment, provide support for incorporating proposed quality assessment considerations of ML4H during the entire development life cycle, and suggest improvements for future ML4H reference evaluation frameworks.}
}

Endnote

%0 Conference Paper
%T ML4H Auditing: From Paper to Practice
%A Luis Oala
%A Jana Fehr
%A Luca Gilli
%A Pradeep Balachandran
%A Alixandro Werneck Leite
%A Saul Calderon-Ramirez
%A Danny Xie Li
%A Gabriel Nobis
%A Erick Alejandro Muñoz Alvarado
%A Giovanna Jaramillo-Gutierrez
%A Christian Matek
%A Arun Shroff
%A Ferath Kherif
%A Bruno Sanguinetti
%A Thomas Wiegand
%B Proceedings of the Machine Learning for Health NeurIPS Workshop
%C Proceedings of Machine Learning Research
%D 2020
%E Emily Alsentzer
%E Matthew B. A. McDermott
%E Fabian Falck
%E Suproteem K. Sarkar
%E Subhrajit Roy
%E Stephanie L. Hyland	
%F pmlr-v136-oala20a
%I PMLR
%P 280--317
%U https://proceedings.mlr.press/v136/oala20a.html
%V 136
%X Healthcare systems are currently adapting to digital technologies, producing large quantities of novel data. Based on these data, machine-learning algorithms have been developed to support practitioners in labor-intensive workflows such as diagnosis, prognosis, triage or treatment of disease. However, their translation into medical practice is often hampered by a lack of careful evaluation in different settings. Efforts have started worldwide to establish guidelines for evaluating machine learning for health (ML4H) tools, highlighting the necessity to evaluate models for bias, interpretability, robustness, and possible failure modes. However, testing and adopting these guidelines in practice remains an open challenge. In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics. The assessment comprises dimensions such as bias, interpretability, and robustness. Our results highlight the importance of fine-grained and caseadapted quality assessment, provide support for incorporating proposed quality assessment considerations of ML4H during the entire development life cycle, and suggest improvements for future ML4H reference evaluation frameworks.

APA

Oala, L., Fehr, J., Gilli, L., Balachandran, P., Leite, A.W., Calderon-Ramirez, S., Li, D.X., Nobis, G., Alvarado, E.A.M., Jaramillo-Gutierrez, G., Matek, C., Shroff, A., Kherif, F., Sanguinetti, B. & Wiegand, T.. (2020). ML4H Auditing: From Paper to Practice. Proceedings of the Machine Learning for Health NeurIPS Workshop, in Proceedings of Machine Learning Research 136:280-317 Available from https://proceedings.mlr.press/v136/oala20a.html.

Related Material

Download PDF