Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

Hye Sun Yun, Karen Y.C. Zhang, Ramez Kouzy, Iain James Marshall, Junyi Jessy Li, Byron C Wallace
Proceedings of the sixth Conference on Health, Inference, and Learning, PMLR 287:458-479, 2025.

Abstract

Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin’s impact on LLM outputs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v287-yun25a, title = {Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?}, author = {Yun, Hye Sun and Zhang, Karen Y.C. and Kouzy, Ramez and Marshall, Iain James and Li, Junyi Jessy and Wallace, Byron C}, booktitle = {Proceedings of the sixth Conference on Health, Inference, and Learning}, pages = {458--479}, year = {2025}, editor = {Xu, Xuhai Orson and Choi, Edward and Singhal, Pankhuri and Gerych, Walter and Tang, Shengpu and Agrawal, Monica and Subbaswamy, Adarsh and Sizikova, Elena and Dunn, Jessilyn and Daneshjou, Roxana and Sarker, Tasmie and McDermott, Matthew and Chen, Irene}, volume = {287}, series = {Proceedings of Machine Learning Research}, month = {25--27 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v287/main/assets/yun25a/yun25a.pdf}, url = {https://proceedings.mlr.press/v287/yun25a.html}, abstract = {Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin’s impact on LLM outputs.} }
Endnote
%0 Conference Paper %T Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? %A Hye Sun Yun %A Karen Y.C. Zhang %A Ramez Kouzy %A Iain James Marshall %A Junyi Jessy Li %A Byron C Wallace %B Proceedings of the sixth Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2025 %E Xuhai Orson Xu %E Edward Choi %E Pankhuri Singhal %E Walter Gerych %E Shengpu Tang %E Monica Agrawal %E Adarsh Subbaswamy %E Elena Sizikova %E Jessilyn Dunn %E Roxana Daneshjou %E Tasmie Sarker %E Matthew McDermott %E Irene Chen %F pmlr-v287-yun25a %I PMLR %P 458--479 %U https://proceedings.mlr.press/v287/yun25a.html %V 287 %X Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin’s impact on LLM outputs.
APA
Yun, H.S., Zhang, K.Y., Kouzy, R., Marshall, I.J., Li, J.J. & Wallace, B.C.. (2025). Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?. Proceedings of the sixth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 287:458-479 Available from https://proceedings.mlr.press/v287/yun25a.html.

Related Material