Detecting sensitive medical responses in general purpose large language models

Daniel Lopez-Martinez, Abhishek Bafna
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:680-695, 2025.

Abstract

Generalist large language models (LLMs), not developed to do particular medical tasks, have achieved widespread use by the public. To avoid medical uses of these LLMs that have not been adequately tested and thus minimize any potential health risks, it is paramount that these models use adequate guardrails and safety measures. In this work, we propose a synthetic medical prompt generation method to evaluate generalist LLMs and enable red-teaming efforts. Using a commercial LLM and our dataset of synthetic user prompts, we illustrate how our methodology may used to identify responses for further evaluation and to assess whether guardrails are consistently implemented. Finally, we investigate the use of Flan-T5 in detecting LLM responses that offer unvetted medical advice and neglect to instruct users to consult with licensed professionals.

Cite this Paper


BibTeX
@InProceedings{pmlr-v259-lopez-martinez25a, title = {Detecting sensitive medical responses in general purpose large language models}, author = {Lopez-Martinez, Daniel and Bafna, Abhishek}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {680--695}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v259/main/assets/lopez-martinez25a/lopez-martinez25a.pdf}, url = {https://proceedings.mlr.press/v259/lopez-martinez25a.html}, abstract = {Generalist large language models (LLMs), not developed to do particular medical tasks, have achieved widespread use by the public. To avoid medical uses of these LLMs that have not been adequately tested and thus minimize any potential health risks, it is paramount that these models use adequate guardrails and safety measures. In this work, we propose a synthetic medical prompt generation method to evaluate generalist LLMs and enable red-teaming efforts. Using a commercial LLM and our dataset of synthetic user prompts, we illustrate how our methodology may used to identify responses for further evaluation and to assess whether guardrails are consistently implemented. Finally, we investigate the use of Flan-T5 in detecting LLM responses that offer unvetted medical advice and neglect to instruct users to consult with licensed professionals.} }
Endnote
%0 Conference Paper %T Detecting sensitive medical responses in general purpose large language models %A Daniel Lopez-Martinez %A Abhishek Bafna %B Proceedings of the 4th Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2025 %E Stefan Hegselmann %E Helen Zhou %E Elizabeth Healey %E Trenton Chang %E Caleb Ellington %E Vishwali Mhasawade %E Sana Tonekaboni %E Peniel Argaw %E Haoran Zhang %F pmlr-v259-lopez-martinez25a %I PMLR %P 680--695 %U https://proceedings.mlr.press/v259/lopez-martinez25a.html %V 259 %X Generalist large language models (LLMs), not developed to do particular medical tasks, have achieved widespread use by the public. To avoid medical uses of these LLMs that have not been adequately tested and thus minimize any potential health risks, it is paramount that these models use adequate guardrails and safety measures. In this work, we propose a synthetic medical prompt generation method to evaluate generalist LLMs and enable red-teaming efforts. Using a commercial LLM and our dataset of synthetic user prompts, we illustrate how our methodology may used to identify responses for further evaluation and to assess whether guardrails are consistently implemented. Finally, we investigate the use of Flan-T5 in detecting LLM responses that offer unvetted medical advice and neglect to instruct users to consult with licensed professionals.
APA
Lopez-Martinez, D. & Bafna, A.. (2025). Detecting sensitive medical responses in general purpose large language models. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:680-695 Available from https://proceedings.mlr.press/v259/lopez-martinez25a.html.

Related Material