How does my language model understand clinical text?

Furong Jia, David Sontag, Monica Agrawal
Proceedings of the sixth Conference on Health, Inference, and Learning, PMLR 287:720-743, 2025.

Abstract

Large language models (LLMs) have performed well across various tasks in clinical natural language processing tasks, despite not being directly trained on electronic health record (EHR) data. In this work, we examine how popular open-source LLMs learn clinical information from large mined corpora through two crucial but understudied lenses: (1) their interpretation of clinical jargon, a foundational ability for understanding real-world clinical notes, and (2) their responses to medical misinformation. For both use cases, we investigate the frequency of relevant clinical information in their corresponding pretraining corpora, the relationship between pretraining data composition and model outputs, and the sources underlying this data. To isolate clinical jargon understanding, we evaluate LLMs on a new dataset MedLingo. Unsurprisingly, we find that the frequency of clinical jargon mentions across major pretraining corpora correlates with model performance. However, jargon frequently appearing in clinical notes often rarely appears in pretraining corpora, revealing a mismatch between available data and real-world usage. Similarly, we find that a non-negligible portion of documents support disputed claims that can then be parroted by models. Finally, we classified and analyzed the types of online sources in which clinical jargon and misinformation appear, with implications for future dataset composition.

Cite this Paper


BibTeX
@InProceedings{pmlr-v287-jia25a, title = {How does my language model understand clinical text?}, author = {Jia, Furong and Sontag, David and Agrawal, Monica}, booktitle = {Proceedings of the sixth Conference on Health, Inference, and Learning}, pages = {720--743}, year = {2025}, editor = {Xu, Xuhai Orson and Choi, Edward and Singhal, Pankhuri and Gerych, Walter and Tang, Shengpu and Agrawal, Monica and Subbaswamy, Adarsh and Sizikova, Elena and Dunn, Jessilyn and Daneshjou, Roxana and Sarker, Tasmie and McDermott, Matthew and Chen, Irene}, volume = {287}, series = {Proceedings of Machine Learning Research}, month = {25--27 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v287/main/assets/jia25a/jia25a.pdf}, url = {https://proceedings.mlr.press/v287/jia25a.html}, abstract = {Large language models (LLMs) have performed well across various tasks in clinical natural language processing tasks, despite not being directly trained on electronic health record (EHR) data. In this work, we examine how popular open-source LLMs learn clinical information from large mined corpora through two crucial but understudied lenses: (1) their interpretation of clinical jargon, a foundational ability for understanding real-world clinical notes, and (2) their responses to medical misinformation. For both use cases, we investigate the frequency of relevant clinical information in their corresponding pretraining corpora, the relationship between pretraining data composition and model outputs, and the sources underlying this data. To isolate clinical jargon understanding, we evaluate LLMs on a new dataset MedLingo. Unsurprisingly, we find that the frequency of clinical jargon mentions across major pretraining corpora correlates with model performance. However, jargon frequently appearing in clinical notes often rarely appears in pretraining corpora, revealing a mismatch between available data and real-world usage. Similarly, we find that a non-negligible portion of documents support disputed claims that can then be parroted by models. Finally, we classified and analyzed the types of online sources in which clinical jargon and misinformation appear, with implications for future dataset composition.} }
Endnote
%0 Conference Paper %T How does my language model understand clinical text? %A Furong Jia %A David Sontag %A Monica Agrawal %B Proceedings of the sixth Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2025 %E Xuhai Orson Xu %E Edward Choi %E Pankhuri Singhal %E Walter Gerych %E Shengpu Tang %E Monica Agrawal %E Adarsh Subbaswamy %E Elena Sizikova %E Jessilyn Dunn %E Roxana Daneshjou %E Tasmie Sarker %E Matthew McDermott %E Irene Chen %F pmlr-v287-jia25a %I PMLR %P 720--743 %U https://proceedings.mlr.press/v287/jia25a.html %V 287 %X Large language models (LLMs) have performed well across various tasks in clinical natural language processing tasks, despite not being directly trained on electronic health record (EHR) data. In this work, we examine how popular open-source LLMs learn clinical information from large mined corpora through two crucial but understudied lenses: (1) their interpretation of clinical jargon, a foundational ability for understanding real-world clinical notes, and (2) their responses to medical misinformation. For both use cases, we investigate the frequency of relevant clinical information in their corresponding pretraining corpora, the relationship between pretraining data composition and model outputs, and the sources underlying this data. To isolate clinical jargon understanding, we evaluate LLMs on a new dataset MedLingo. Unsurprisingly, we find that the frequency of clinical jargon mentions across major pretraining corpora correlates with model performance. However, jargon frequently appearing in clinical notes often rarely appears in pretraining corpora, revealing a mismatch between available data and real-world usage. Similarly, we find that a non-negligible portion of documents support disputed claims that can then be parroted by models. Finally, we classified and analyzed the types of online sources in which clinical jargon and misinformation appear, with implications for future dataset composition.
APA
Jia, F., Sontag, D. & Agrawal, M.. (2025). How does my language model understand clinical text?. Proceedings of the sixth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 287:720-743 Available from https://proceedings.mlr.press/v287/jia25a.html.

Related Material