Clinically-aligned Multi-modal Chest X-ray Classification

Phillip Sloan, Edwin Simpson, Majid Mirmehdi
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:228-242, 2026.

Abstract

Radiology is essential to modern healthcare, yet rising demand and staffing shortages continue to pose major challenges. Recent advances in artificial intelligence have the potential to support radiologists and help address these challenges. Given its widespread use and clinical importance, chest X-ray classification is well suited to augment radiologists workflows. However, most existing approaches rely solely on single-view, image-level inputs, ignoring the structured clinical information and multi-image studies available at the time of reporting. In this work, we introduce CaMCheX, a multimodal transformer-based framework that aligns multi-view chest X-ray studies with structured clinical data to better reflect how clinicians make diagnostic decisions. Our architecture employs view-specific ConvNeXt encoders for frontal and lateral chest radiographs, whose features are fused with clinical indications, history and vital signs using a transformer fusion module. This design enables the model to generate context-aware representations that mirror the reasoning in clinical practice. Our results exceed the state of the art for both the original MIMIC-CXR dataset and the more recent CXR-LT benchmarks, and highlight the value of clinically grounded multimodal alignment for advancing chest X-ray classification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-sloan26a, title = {Clinically-aligned Multi-modal Chest X-ray Classification}, author = {Sloan, Phillip and Simpson, Edwin and Mirmehdi, Majid}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {228--242}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/sloan26a/sloan26a.pdf}, url = {https://proceedings.mlr.press/v297/sloan26a.html}, abstract = {Radiology is essential to modern healthcare, yet rising demand and staffing shortages continue to pose major challenges. Recent advances in artificial intelligence have the potential to support radiologists and help address these challenges. Given its widespread use and clinical importance, chest X-ray classification is well suited to augment radiologists workflows. However, most existing approaches rely solely on single-view, image-level inputs, ignoring the structured clinical information and multi-image studies available at the time of reporting. In this work, we introduce CaMCheX, a multimodal transformer-based framework that aligns multi-view chest X-ray studies with structured clinical data to better reflect how clinicians make diagnostic decisions. Our architecture employs view-specific ConvNeXt encoders for frontal and lateral chest radiographs, whose features are fused with clinical indications, history and vital signs using a transformer fusion module. This design enables the model to generate context-aware representations that mirror the reasoning in clinical practice. Our results exceed the state of the art for both the original MIMIC-CXR dataset and the more recent CXR-LT benchmarks, and highlight the value of clinically grounded multimodal alignment for advancing chest X-ray classification.} }
Endnote
%0 Conference Paper %T Clinically-aligned Multi-modal Chest X-ray Classification %A Phillip Sloan %A Edwin Simpson %A Majid Mirmehdi %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-sloan26a %I PMLR %P 228--242 %U https://proceedings.mlr.press/v297/sloan26a.html %V 297 %X Radiology is essential to modern healthcare, yet rising demand and staffing shortages continue to pose major challenges. Recent advances in artificial intelligence have the potential to support radiologists and help address these challenges. Given its widespread use and clinical importance, chest X-ray classification is well suited to augment radiologists workflows. However, most existing approaches rely solely on single-view, image-level inputs, ignoring the structured clinical information and multi-image studies available at the time of reporting. In this work, we introduce CaMCheX, a multimodal transformer-based framework that aligns multi-view chest X-ray studies with structured clinical data to better reflect how clinicians make diagnostic decisions. Our architecture employs view-specific ConvNeXt encoders for frontal and lateral chest radiographs, whose features are fused with clinical indications, history and vital signs using a transformer fusion module. This design enables the model to generate context-aware representations that mirror the reasoning in clinical practice. Our results exceed the state of the art for both the original MIMIC-CXR dataset and the more recent CXR-LT benchmarks, and highlight the value of clinically grounded multimodal alignment for advancing chest X-ray classification.
APA
Sloan, P., Simpson, E. & Mirmehdi, M.. (2026). Clinically-aligned Multi-modal Chest X-ray Classification. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:228-242 Available from https://proceedings.mlr.press/v297/sloan26a.html.

Related Material