ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Xiaoman Zhang, Hong-Yu Zhou, Xiaoli Yang, Oishi Banerjee, Julián N. Acosta, Josh Miller, Ouwen Huang, Pranav Rajpurkar
Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 281:90-99, 2025.

Abstract

AI-driven models have demonstrated significant potential in automating radiology report generation for chest X-rays. However, there is no standardized benchmark for objectively evaluating their performance. To address this, we present ReXrank, a public leaderboard and challenge for assessing AI-powered radiology report generation. Our framework incorporates ReXGradient, the largest test dataset consisting of 10,000 studies, and three public datasets (MIMIC-CXR, IU-Xray, CheXpert Plus) for report generation assessment. ReXrank employs 8 evaluation metrics and separately assesses models capable of generating only findings sections and those providing both findings and impressions sections. By providing this standardized evaluation framework, ReXrank enables meaningful comparisons of model performance and offers crucial insights into their robustness across diverse clinical settings. Beyond its current focus on chest X-rays, ReXrank’s framework sets the stage for comprehensive evaluation of automated reporting across the full spectrum of medical imaging.

Cite this Paper


BibTeX
@InProceedings{pmlr-v281-zhang25b, title = {ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation}, author = {Zhang, Xiaoman and Zhou, Hong-Yu and Yang, Xiaoli and Banerjee, Oishi and Acosta, Juli\'an N. and Miller, Josh and Huang, Ouwen and Rajpurkar, Pranav}, booktitle = {Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare}, pages = {90--99}, year = {2025}, editor = {Wu, Junde and Zhu, Jiayuan and Xu, Min and Jin, Yueming}, volume = {281}, series = {Proceedings of Machine Learning Research}, month = {25 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v281/main/assets/zhang25b/zhang25b.pdf}, url = {https://proceedings.mlr.press/v281/zhang25b.html}, abstract = {AI-driven models have demonstrated significant potential in automating radiology report generation for chest X-rays. However, there is no standardized benchmark for objectively evaluating their performance. To address this, we present ReXrank, a public leaderboard and challenge for assessing AI-powered radiology report generation. Our framework incorporates ReXGradient, the largest test dataset consisting of 10,000 studies, and three public datasets (MIMIC-CXR, IU-Xray, CheXpert Plus) for report generation assessment. ReXrank employs 8 evaluation metrics and separately assesses models capable of generating only findings sections and those providing both findings and impressions sections. By providing this standardized evaluation framework, ReXrank enables meaningful comparisons of model performance and offers crucial insights into their robustness across diverse clinical settings. Beyond its current focus on chest X-rays, ReXrank’s framework sets the stage for comprehensive evaluation of automated reporting across the full spectrum of medical imaging.} }
Endnote
%0 Conference Paper %T ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation %A Xiaoman Zhang %A Hong-Yu Zhou %A Xiaoli Yang %A Oishi Banerjee %A Julián N. Acosta %A Josh Miller %A Ouwen Huang %A Pranav Rajpurkar %B Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare %C Proceedings of Machine Learning Research %D 2025 %E Junde Wu %E Jiayuan Zhu %E Min Xu %E Yueming Jin %F pmlr-v281-zhang25b %I PMLR %P 90--99 %U https://proceedings.mlr.press/v281/zhang25b.html %V 281 %X AI-driven models have demonstrated significant potential in automating radiology report generation for chest X-rays. However, there is no standardized benchmark for objectively evaluating their performance. To address this, we present ReXrank, a public leaderboard and challenge for assessing AI-powered radiology report generation. Our framework incorporates ReXGradient, the largest test dataset consisting of 10,000 studies, and three public datasets (MIMIC-CXR, IU-Xray, CheXpert Plus) for report generation assessment. ReXrank employs 8 evaluation metrics and separately assesses models capable of generating only findings sections and those providing both findings and impressions sections. By providing this standardized evaluation framework, ReXrank enables meaningful comparisons of model performance and offers crucial insights into their robustness across diverse clinical settings. Beyond its current focus on chest X-rays, ReXrank’s framework sets the stage for comprehensive evaluation of automated reporting across the full spectrum of medical imaging.
APA
Zhang, X., Zhou, H., Yang, X., Banerjee, O., Acosta, J.N., Miller, J., Huang, O. & Rajpurkar, P.. (2025). ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation. Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare, in Proceedings of Machine Learning Research 281:90-99 Available from https://proceedings.mlr.press/v281/zhang25b.html.

Related Material