FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

Luisa Gallée; Yiheng Xiong; Meinrad Beer; Michael Götz

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:198-214, 2026.

Abstract

Densely annotated medical image datasets that capture not only diagnostic labels but also the underlying reasoning behind these diagnoses are scarce. Such reasoning-related annotations are essential for developing and evaluating explainable AI (xAI) models that reason similarly to radiologists: making correct predictions for the right reasons. To address this gap, we introduce FunnyNodules, a fully parameterized synthetic dataset designed for systematic analysis of attribute-based reasoning in medical AI models. The dataset generates abstract lung nodule–like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation. The target class is derived from a predefined attribute combination, allowing full control over the decision rule that links attributes to the diagnostic class. We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute–target relations, to interpret over- or underperformance in attribute prediction, and to analyze attention alignment with attribute-specific regions of interest. The framework is fully customizable, supporting variations in dataset complexity, target definitions, class balance, and beyond. With complete ground truth information, FunnyNodules provides a versatile foundation for developing, benchmarking, and conducting in-depth analyses of explainable AI methods in medical image analysis.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-gallee26a,
  title = 	 {FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI},
  author =       {Gall{\'e}e, Luisa and Xiong, Yiheng and Beer, Meinrad and G{\"o}tz, Michael},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {198--214},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/gallee26a/gallee26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/gallee26a.html},
  abstract = 	 {Densely annotated medical image datasets that capture not only diagnostic labels but also the underlying reasoning behind these diagnoses are scarce. Such reasoning-related annotations are essential for developing and evaluating explainable AI (xAI) models that reason similarly to radiologists: making correct predictions for the right reasons. To address this gap, we introduce FunnyNodules, a fully parameterized synthetic dataset designed for systematic analysis of attribute-based reasoning in medical AI models. The dataset generates abstract lung nodule–like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation. The target class is derived from a predefined attribute combination, allowing full control over the decision rule that links attributes to the diagnostic class. We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute–target relations, to interpret over- or underperformance in attribute prediction, and to analyze attention alignment with attribute-specific regions of interest. The framework is fully customizable, supporting variations in dataset complexity, target definitions, class balance, and beyond. With complete ground truth information, FunnyNodules provides a versatile foundation for developing, benchmarking, and conducting in-depth analyses of explainable AI methods in medical image analysis.}
}

Endnote

%0 Conference Paper
%T FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI
%A Luisa Gallée
%A Yiheng Xiong
%A Meinrad Beer
%A Michael Götz
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-gallee26a
%I PMLR
%P 198--214
%U https://proceedings.mlr.press/v315/gallee26a.html
%V 315
%X Densely annotated medical image datasets that capture not only diagnostic labels but also the underlying reasoning behind these diagnoses are scarce. Such reasoning-related annotations are essential for developing and evaluating explainable AI (xAI) models that reason similarly to radiologists: making correct predictions for the right reasons. To address this gap, we introduce FunnyNodules, a fully parameterized synthetic dataset designed for systematic analysis of attribute-based reasoning in medical AI models. The dataset generates abstract lung nodule–like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation. The target class is derived from a predefined attribute combination, allowing full control over the decision rule that links attributes to the diagnostic class. We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute–target relations, to interpret over- or underperformance in attribute prediction, and to analyze attention alignment with attribute-specific regions of interest. The framework is fully customizable, supporting variations in dataset complexity, target definitions, class balance, and beyond. With complete ground truth information, FunnyNodules provides a versatile foundation for developing, benchmarking, and conducting in-depth analyses of explainable AI methods in medical image analysis.

APA

Gallée, L., Xiong, Y., Beer, M. & Götz, M.. (2026). FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:198-214 Available from https://proceedings.mlr.press/v315/gallee26a.html.

Related Material

Download PDF