Enhancing Wrist Fracture Detection through LLM-Powered Data Extraction and Knowledge-Based Ensemble Learning

Serge Didenko Vasylechko; Andy Tsai; Onur Afacan; Sila Kurugol

Enhancing Wrist Fracture Detection through LLM-Powered Data Extraction and Knowledge-Based Ensemble Learning

Serge Didenko Vasylechko, Andy Tsai, Onur Afacan, Sila Kurugol

Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:1627-1637, 2026.

Abstract

The accuracy and generalization of deep learning models for fracture detection and classification in wrist radiographs is often limited by the scarcity of high-quality annotated data and class imbalances. Traditional annotation methods are time-consuming, expensive and prone to inter-observer variability \cite{rajpurkar2017mura}. To address these challenges, we developed an automated, cost-free approach to extract structured information from radiology reports, such as fracture type, location and severity. Our technique incorporates methods introduced by MedPrompt \cite{nori2023can}, and leverages domain expertise for group based sampling \cite{khan2024knowledge}. Using these structured language labels alongside a pre-trained YOLO v7 backbone \cite{nagy2022pediatric, ciri2023bonefracture}, which initially demonstrated low accuracy scores on our clinical data, we were able to selectively finetune the model in pseudo-blind manner. This approach utilized the extracted language labels without requiring expert annotations for training. We curated a large dataset of almost 3,000 pediatric wrist X-ray images and their corresponding radiology reports. Validation and testing were conducted on a smaller subset of 300 expert-annotated images.Our findings indicate that this pseudo-blind training strategy significantly enhances the base accuracy of the pre-trained model, achieving performance comparable to models fine-tuned with meticulously labeled expert annotations. Specifically, we improved the mean Average Precision (mAP) detection score for true positives related to fractures from 76% to 83%. Additionally, we observed improvements in precision and recall metrics for fracture detection. By integrating prompt-based information extraction with knowledge-based grouping, we achieved a robust and effective model for fracture detection.

Cite this Paper

BibTeX

@InProceedings{pmlr-v301-vasylechko26a,
  title = 	 {Enhancing Wrist Fracture Detection through LLM-Powered Data Extraction and Knowledge-Based Ensemble Learning},
  author =       {Vasylechko, Serge Didenko and Tsai, Andy and Afacan, Onur and Kurugol, Sila},
  booktitle = 	 {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {1627--1637},
  year = 	 {2026},
  editor = 	 {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan},
  volume = 	 {301},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v301/main/assets/vasylechko26a/vasylechko26a.pdf},
  url = 	 {https://proceedings.mlr.press/v301/vasylechko26a.html},
  abstract = 	 {The accuracy and generalization of deep learning models for fracture detection and classification in wrist radiographs is often limited by the scarcity of high-quality annotated data and class imbalances. Traditional annotation methods are time-consuming, expensive and prone to inter-observer variability \cite{rajpurkar2017mura}.  To address these challenges, we developed an automated, cost-free approach to extract structured information from radiology reports, such as fracture type, location and severity. Our technique incorporates methods introduced by MedPrompt \cite{nori2023can}, and leverages domain expertise for group based sampling \cite{khan2024knowledge}. Using these structured language labels alongside a pre-trained YOLO v7 backbone \cite{nagy2022pediatric, ciri2023bonefracture}, which initially demonstrated low accuracy scores on our clinical data, we were able to selectively finetune the model in pseudo-blind manner. This approach utilized the extracted language labels without requiring expert annotations for training. We curated a large dataset of almost 3,000 pediatric wrist X-ray images and their corresponding radiology reports. Validation and testing were conducted on a smaller subset of 300 expert-annotated images.Our findings indicate that this pseudo-blind training strategy significantly enhances the base accuracy of the pre-trained model, achieving performance comparable to models fine-tuned with meticulously labeled expert annotations. Specifically, we improved the mean Average Precision (mAP) detection score for true positives related to fractures from 76% to 83%. Additionally, we observed improvements in precision and recall metrics for fracture detection. By integrating prompt-based information extraction with knowledge-based grouping, we achieved a robust and effective model for fracture detection.}
}

Endnote

%0 Conference Paper
%T Enhancing Wrist Fracture Detection through LLM-Powered Data Extraction and Knowledge-Based Ensemble Learning
%A Serge Didenko Vasylechko
%A Andy Tsai
%A Onur Afacan
%A Sila Kurugol
%B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Tolga Tasdizen
%E Shireen Elhabian
%E Ronald Summers
%E Chen Chen
%E Lisa Koch
%E Yan Zhuang	
%F pmlr-v301-vasylechko26a
%I PMLR
%P 1627--1637
%U https://proceedings.mlr.press/v301/vasylechko26a.html
%V 301
%X The accuracy and generalization of deep learning models for fracture detection and classification in wrist radiographs is often limited by the scarcity of high-quality annotated data and class imbalances. Traditional annotation methods are time-consuming, expensive and prone to inter-observer variability \cite{rajpurkar2017mura}.  To address these challenges, we developed an automated, cost-free approach to extract structured information from radiology reports, such as fracture type, location and severity. Our technique incorporates methods introduced by MedPrompt \cite{nori2023can}, and leverages domain expertise for group based sampling \cite{khan2024knowledge}. Using these structured language labels alongside a pre-trained YOLO v7 backbone \cite{nagy2022pediatric, ciri2023bonefracture}, which initially demonstrated low accuracy scores on our clinical data, we were able to selectively finetune the model in pseudo-blind manner. This approach utilized the extracted language labels without requiring expert annotations for training. We curated a large dataset of almost 3,000 pediatric wrist X-ray images and their corresponding radiology reports. Validation and testing were conducted on a smaller subset of 300 expert-annotated images.Our findings indicate that this pseudo-blind training strategy significantly enhances the base accuracy of the pre-trained model, achieving performance comparable to models fine-tuned with meticulously labeled expert annotations. Specifically, we improved the mean Average Precision (mAP) detection score for true positives related to fractures from 76% to 83%. Additionally, we observed improvements in precision and recall metrics for fracture detection. By integrating prompt-based information extraction with knowledge-based grouping, we achieved a robust and effective model for fracture detection.

APA

Vasylechko, S.D., Tsai, A., Afacan, O. & Kurugol, S.. (2026). Enhancing Wrist Fracture Detection through LLM-Powered Data Extraction and Knowledge-Based Ensemble Learning. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:1627-1637 Available from https://proceedings.mlr.press/v301/vasylechko26a.html.

Related Material

Download PDF