Are Clinical T5 Models Better for Clinical Text?

Yahan Li, Keith Harrigian, Ayah Zirikly, Mark Dredze
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:636-667, 2025.

Abstract

Large language models with a transformer-based encoder/decoder architecture, such as T5, have become standard platforms for supervised tasks. To bring these technologies to the clinical domain, recent work has trained new or adapted existing models to clinical data. However, the evaluation of these clinical T5 models and comparison to other models has been limited. Are the clinical T5 models better choices than FLAN-tuned generic T5 models? Do they generalize better to new clinical domains that differ from the training sets? We comprehensively evaluate these models across several clinical tasks and domains. We find that clinical T5 models provide marginal improvements over existing models, and perform worse when evaluated on different domains. Our results inform future choices in developing clinical LLMs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v259-li25a, title = {Are Clinical T5 Models Better for Clinical Text?}, author = {Li, Yahan and Harrigian, Keith and Zirikly, Ayah and Dredze, Mark}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {636--667}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v259/main/assets/li25a/li25a.pdf}, url = {https://proceedings.mlr.press/v259/li25a.html}, abstract = {Large language models with a transformer-based encoder/decoder architecture, such as T5, have become standard platforms for supervised tasks. To bring these technologies to the clinical domain, recent work has trained new or adapted existing models to clinical data. However, the evaluation of these clinical T5 models and comparison to other models has been limited. Are the clinical T5 models better choices than FLAN-tuned generic T5 models? Do they generalize better to new clinical domains that differ from the training sets? We comprehensively evaluate these models across several clinical tasks and domains. We find that clinical T5 models provide marginal improvements over existing models, and perform worse when evaluated on different domains. Our results inform future choices in developing clinical LLMs.} }
Endnote
%0 Conference Paper %T Are Clinical T5 Models Better for Clinical Text? %A Yahan Li %A Keith Harrigian %A Ayah Zirikly %A Mark Dredze %B Proceedings of the 4th Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2025 %E Stefan Hegselmann %E Helen Zhou %E Elizabeth Healey %E Trenton Chang %E Caleb Ellington %E Vishwali Mhasawade %E Sana Tonekaboni %E Peniel Argaw %E Haoran Zhang %F pmlr-v259-li25a %I PMLR %P 636--667 %U https://proceedings.mlr.press/v259/li25a.html %V 259 %X Large language models with a transformer-based encoder/decoder architecture, such as T5, have become standard platforms for supervised tasks. To bring these technologies to the clinical domain, recent work has trained new or adapted existing models to clinical data. However, the evaluation of these clinical T5 models and comparison to other models has been limited. Are the clinical T5 models better choices than FLAN-tuned generic T5 models? Do they generalize better to new clinical domains that differ from the training sets? We comprehensively evaluate these models across several clinical tasks and domains. We find that clinical T5 models provide marginal improvements over existing models, and perform worse when evaluated on different domains. Our results inform future choices in developing clinical LLMs.
APA
Li, Y., Harrigian, K., Zirikly, A. & Dredze, M.. (2025). Are Clinical T5 Models Better for Clinical Text?. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:636-667 Available from https://proceedings.mlr.press/v259/li25a.html.

Related Material