MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models

Harshita Sharma, Valentina Salvatelli, Shaury Srivastav, Kenza Bouzid, Shruthi Bannur, Daniel C. Castro, Maximilian Ilse, Sam Bond-Taylor, Mercy Prasanna Ranjit, Fabian Falck, Fernando Pérez-García, Anton Schwaighofer, Hannah Richardson, Maria Wetscherek, Stephanie Hyland, Javier Alvarez-Valle
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:941-960, 2025.

Abstract

There is growing interest in applying AI to radiology report generation, particularly for chest X-rays (CXRs). This paper investigates whether incorporating pixel-level information through segmentation masks can improve fine-grained image interpretation of multimodal large language models (MLLMs) for radiology report generation. We introduce MAIRA-Seg, a segmentation-aware MLLM framework designed to utilize semantic segmentation masks alongside CXRs for generating radiology reports. We train expert segmentation models to obtain mask pseudolabels for radiology-specific structures in CXRs. Subsequently, building on the architectures of MAIRA, a CXR-specialised model for report generation, we integrate a trainable segmentation tokens extractor that leverages these mask pseudolabels, and employ mask-aware prompting to generate draft radiology reports. Our experiments on the publicly available MIMIC-CXR dataset show that MAIRA-Seg outperforms non-segmentation baselines. We also investigate set-of-marks prompting with MAIRA and find that MAIRA-Seg consistently demonstrates comparable or superior performance. The results confirm that using segmentation masks enhances the nuanced reasoning of MLLMs, potentially contributing to better clinical outcomes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v259-sharma25a, title = {MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models}, author = {Sharma, Harshita and Salvatelli, Valentina and Srivastav, Shaury and Bouzid, Kenza and Bannur, Shruthi and C. Castro, Daniel and Ilse, Maximilian and Bond-Taylor, Sam and Prasanna Ranjit, Mercy and Falck, Fabian and P{\'{e}}rez-Garc{\'{i}}a, Fernando and Schwaighofer, Anton and Richardson, Hannah and Wetscherek, Maria and Hyland, Stephanie and Alvarez-Valle, Javier}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {941--960}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v259/main/assets/sharma25a/sharma25a.pdf}, url = {https://proceedings.mlr.press/v259/sharma25a.html}, abstract = {There is growing interest in applying AI to radiology report generation, particularly for chest X-rays (CXRs). This paper investigates whether incorporating pixel-level information through segmentation masks can improve fine-grained image interpretation of multimodal large language models (MLLMs) for radiology report generation. We introduce MAIRA-Seg, a segmentation-aware MLLM framework designed to utilize semantic segmentation masks alongside CXRs for generating radiology reports. We train expert segmentation models to obtain mask pseudolabels for radiology-specific structures in CXRs. Subsequently, building on the architectures of MAIRA, a CXR-specialised model for report generation, we integrate a trainable segmentation tokens extractor that leverages these mask pseudolabels, and employ mask-aware prompting to generate draft radiology reports. Our experiments on the publicly available MIMIC-CXR dataset show that MAIRA-Seg outperforms non-segmentation baselines. We also investigate set-of-marks prompting with MAIRA and find that MAIRA-Seg consistently demonstrates comparable or superior performance. The results confirm that using segmentation masks enhances the nuanced reasoning of MLLMs, potentially contributing to better clinical outcomes.} }
Endnote
%0 Conference Paper %T MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models %A Harshita Sharma %A Valentina Salvatelli %A Shaury Srivastav %A Kenza Bouzid %A Shruthi Bannur %A Daniel C. Castro %A Maximilian Ilse %A Sam Bond-Taylor %A Mercy Prasanna Ranjit %A Fabian Falck %A Fernando Pérez-García %A Anton Schwaighofer %A Hannah Richardson %A Maria Wetscherek %A Stephanie Hyland %A Javier Alvarez-Valle %B Proceedings of the 4th Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2025 %E Stefan Hegselmann %E Helen Zhou %E Elizabeth Healey %E Trenton Chang %E Caleb Ellington %E Vishwali Mhasawade %E Sana Tonekaboni %E Peniel Argaw %E Haoran Zhang %F pmlr-v259-sharma25a %I PMLR %P 941--960 %U https://proceedings.mlr.press/v259/sharma25a.html %V 259 %X There is growing interest in applying AI to radiology report generation, particularly for chest X-rays (CXRs). This paper investigates whether incorporating pixel-level information through segmentation masks can improve fine-grained image interpretation of multimodal large language models (MLLMs) for radiology report generation. We introduce MAIRA-Seg, a segmentation-aware MLLM framework designed to utilize semantic segmentation masks alongside CXRs for generating radiology reports. We train expert segmentation models to obtain mask pseudolabels for radiology-specific structures in CXRs. Subsequently, building on the architectures of MAIRA, a CXR-specialised model for report generation, we integrate a trainable segmentation tokens extractor that leverages these mask pseudolabels, and employ mask-aware prompting to generate draft radiology reports. Our experiments on the publicly available MIMIC-CXR dataset show that MAIRA-Seg outperforms non-segmentation baselines. We also investigate set-of-marks prompting with MAIRA and find that MAIRA-Seg consistently demonstrates comparable or superior performance. The results confirm that using segmentation masks enhances the nuanced reasoning of MLLMs, potentially contributing to better clinical outcomes.
APA
Sharma, H., Salvatelli, V., Srivastav, S., Bouzid, K., Bannur, S., C. Castro, D., Ilse, M., Bond-Taylor, S., Prasanna Ranjit, M., Falck, F., Pérez-García, F., Schwaighofer, A., Richardson, H., Wetscherek, M., Hyland, S. & Alvarez-Valle, J.. (2025). MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:941-960 Available from https://proceedings.mlr.press/v259/sharma25a.html.

Related Material