Context-Aware Filtering of Unstructured Radiology Reports by Anatomical Region

Zakk Heile, Pranav Manjunath, Brian Lerner, Samuel Berchuck, Monica Agrawal, Timothy W. Dunn
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:864-885, 2026.

Abstract

Radiology reports contain essential clinical information but often remain in unstructured, free-text formats. Notably, multiple imaging examinations performed simultaneously (such as {CT} head, facial bones, and cervical spine in trauma cases) may be bundled into a single report that consolidates findings from all studies into one free-text document, written jointly. Because individual sentences may reference ambiguous or overlapping anatomy (e.g., “there is a fracture”), sentence-level anatomic classification—filtering a report to retain only findings relevant to a specific anatomical region—is essential for downstream tasks such as structured label extraction and for creating clean, bijective training data for radiology report generation models. While formatting differs across reports, the clinical language remains precise. Using that fact, we develop context-aware classical models with feature engineering that surpass trained neural networks and pre-trained language models. We show that the learned model weights generalize effectively to {MIMIC}-{IV} radiology reports and that our approach achieves near-optimal performance with only a small amount of labeled training data. Together, these results make our approach practical and reproducible for new settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-heile26a, title = {Context-Aware Filtering of Unstructured Radiology Reports by Anatomical Region}, author = {Heile, Zakk and Manjunath, Pranav and Lerner, Brian and Berchuck, Samuel and Agrawal, Monica and Dunn, Timothy W.}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {864--885}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/heile26a/heile26a.pdf}, url = {https://proceedings.mlr.press/v297/heile26a.html}, abstract = {Radiology reports contain essential clinical information but often remain in unstructured, free-text formats. Notably, multiple imaging examinations performed simultaneously (such as {CT} head, facial bones, and cervical spine in trauma cases) may be bundled into a single report that consolidates findings from all studies into one free-text document, written jointly. Because individual sentences may reference ambiguous or overlapping anatomy (e.g., “there is a fracture”), sentence-level anatomic classification—filtering a report to retain only findings relevant to a specific anatomical region—is essential for downstream tasks such as structured label extraction and for creating clean, bijective training data for radiology report generation models. While formatting differs across reports, the clinical language remains precise. Using that fact, we develop context-aware classical models with feature engineering that surpass trained neural networks and pre-trained language models. We show that the learned model weights generalize effectively to {MIMIC}-{IV} radiology reports and that our approach achieves near-optimal performance with only a small amount of labeled training data. Together, these results make our approach practical and reproducible for new settings.} }
Endnote
%0 Conference Paper %T Context-Aware Filtering of Unstructured Radiology Reports by Anatomical Region %A Zakk Heile %A Pranav Manjunath %A Brian Lerner %A Samuel Berchuck %A Monica Agrawal %A Timothy W. Dunn %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-heile26a %I PMLR %P 864--885 %U https://proceedings.mlr.press/v297/heile26a.html %V 297 %X Radiology reports contain essential clinical information but often remain in unstructured, free-text formats. Notably, multiple imaging examinations performed simultaneously (such as {CT} head, facial bones, and cervical spine in trauma cases) may be bundled into a single report that consolidates findings from all studies into one free-text document, written jointly. Because individual sentences may reference ambiguous or overlapping anatomy (e.g., “there is a fracture”), sentence-level anatomic classification—filtering a report to retain only findings relevant to a specific anatomical region—is essential for downstream tasks such as structured label extraction and for creating clean, bijective training data for radiology report generation models. While formatting differs across reports, the clinical language remains precise. Using that fact, we develop context-aware classical models with feature engineering that surpass trained neural networks and pre-trained language models. We show that the learned model weights generalize effectively to {MIMIC}-{IV} radiology reports and that our approach achieves near-optimal performance with only a small amount of labeled training data. Together, these results make our approach practical and reproducible for new settings.
APA
Heile, Z., Manjunath, P., Lerner, B., Berchuck, S., Agrawal, M. & Dunn, T.W.. (2026). Context-Aware Filtering of Unstructured Radiology Reports by Anatomical Region. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:864-885 Available from https://proceedings.mlr.press/v297/heile26a.html.

Related Material