Error Profiling of Machine Learning Models: An Exploratory Visualization

Jeffrey Feng, Al Rahrooh, Alex Bui
Proceedings of the 10th Machine Learning for Healthcare Conference, PMLR 298, 2025.

Abstract

While data-driven predictive models are increasingly used in healthcare, their clinical translation remains limited—partly due to challenges in evaluating model performance across design choices. Existing explainability methods often focus on intra-model interpretability but fall short in supporting inter-model comparisons. We present a visualization-based error profiling method that facilitates comparative evaluation by highlighting overlaps and differences in model predictions. Our matrix-based visualization maps which models incorrectly classify which patient subgroups, with color intensity indicating the number of misclassified patients. This approach enables deeper insight into which (sub)populations are consistently (in)correctly classified across models, helping uncover patterns of model (dis)agreement and assess the impact of modeling decisions. We demonstrate our visualization method in four healthcare use cases: 1) missing data imputation in a longitudinal nutritional dataset; 2) feature set analysis using randomized controlled trial data; 3) end-model technical performance in cardiac morbidity prediction; and 4) data modality comparison using a dual-source lung cancer dataset with longitudinal and radiomic features. To evaluate the visualization, we obtained expert feedback and qualitative assessments of decision-making insights. Survey results—across clinicians, computer scientists, and medical informaticians—indicated that our method provides an interpretable and intuitive way to compare model error distributions by highlighting patterns within correctly and incorrectly classified subpopulations across different models. Our comprehensible error profiling approach represents an initial step toward a systematic framework for improving model assessment in clinical tasks. Through this framework, both model developers and end users can better understand when and where a given model is appropriate for real-world clinical deployment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v298-feng25a, title = {Error Profiling of Machine Learning Models: An Exploratory Visualization}, author = {Feng, Jeffrey and Rahrooh, Al and Bui, Alex}, booktitle = {Proceedings of the 10th Machine Learning for Healthcare Conference}, year = {2025}, editor = {Agrawal, Monica and Deshpande, Kaivalya and Engelhard, Matthew and Joshi, Shalmali and Tang, Shengpu and Urteaga, Iñigo}, volume = {298}, series = {Proceedings of Machine Learning Research}, month = {15--16 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v298/main/assets/feng25a/feng25a.pdf}, url = {https://proceedings.mlr.press/v298/feng25a.html}, abstract = {While data-driven predictive models are increasingly used in healthcare, their clinical translation remains limited—partly due to challenges in evaluating model performance across design choices. Existing explainability methods often focus on intra-model interpretability but fall short in supporting inter-model comparisons. We present a visualization-based error profiling method that facilitates comparative evaluation by highlighting overlaps and differences in model predictions. Our matrix-based visualization maps which models incorrectly classify which patient subgroups, with color intensity indicating the number of misclassified patients. This approach enables deeper insight into which (sub)populations are consistently (in)correctly classified across models, helping uncover patterns of model (dis)agreement and assess the impact of modeling decisions. We demonstrate our visualization method in four healthcare use cases: 1) missing data imputation in a longitudinal nutritional dataset; 2) feature set analysis using randomized controlled trial data; 3) end-model technical performance in cardiac morbidity prediction; and 4) data modality comparison using a dual-source lung cancer dataset with longitudinal and radiomic features. To evaluate the visualization, we obtained expert feedback and qualitative assessments of decision-making insights. Survey results—across clinicians, computer scientists, and medical informaticians—indicated that our method provides an interpretable and intuitive way to compare model error distributions by highlighting patterns within correctly and incorrectly classified subpopulations across different models. Our comprehensible error profiling approach represents an initial step toward a systematic framework for improving model assessment in clinical tasks. Through this framework, both model developers and end users can better understand when and where a given model is appropriate for real-world clinical deployment.} }
Endnote
%0 Conference Paper %T Error Profiling of Machine Learning Models: An Exploratory Visualization %A Jeffrey Feng %A Al Rahrooh %A Alex Bui %B Proceedings of the 10th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2025 %E Monica Agrawal %E Kaivalya Deshpande %E Matthew Engelhard %E Shalmali Joshi %E Shengpu Tang %E Iñigo Urteaga %F pmlr-v298-feng25a %I PMLR %U https://proceedings.mlr.press/v298/feng25a.html %V 298 %X While data-driven predictive models are increasingly used in healthcare, their clinical translation remains limited—partly due to challenges in evaluating model performance across design choices. Existing explainability methods often focus on intra-model interpretability but fall short in supporting inter-model comparisons. We present a visualization-based error profiling method that facilitates comparative evaluation by highlighting overlaps and differences in model predictions. Our matrix-based visualization maps which models incorrectly classify which patient subgroups, with color intensity indicating the number of misclassified patients. This approach enables deeper insight into which (sub)populations are consistently (in)correctly classified across models, helping uncover patterns of model (dis)agreement and assess the impact of modeling decisions. We demonstrate our visualization method in four healthcare use cases: 1) missing data imputation in a longitudinal nutritional dataset; 2) feature set analysis using randomized controlled trial data; 3) end-model technical performance in cardiac morbidity prediction; and 4) data modality comparison using a dual-source lung cancer dataset with longitudinal and radiomic features. To evaluate the visualization, we obtained expert feedback and qualitative assessments of decision-making insights. Survey results—across clinicians, computer scientists, and medical informaticians—indicated that our method provides an interpretable and intuitive way to compare model error distributions by highlighting patterns within correctly and incorrectly classified subpopulations across different models. Our comprehensible error profiling approach represents an initial step toward a systematic framework for improving model assessment in clinical tasks. Through this framework, both model developers and end users can better understand when and where a given model is appropriate for real-world clinical deployment.
APA
Feng, J., Rahrooh, A. & Bui, A.. (2025). Error Profiling of Machine Learning Models: An Exploratory Visualization. Proceedings of the 10th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 298 Available from https://proceedings.mlr.press/v298/feng25a.html.

Related Material