[edit]
Detecting sensitive medical responses in general purpose large language models
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:680-695, 2025.
Abstract
Generalist large language models (LLMs), not developed to do particular medical tasks, have achieved widespread use by the public. To avoid medical uses of these LLMs that have not been adequately tested and thus minimize any potential health risks, it is paramount that these models use adequate guardrails and safety measures. In this work, we propose a synthetic medical prompt generation method to evaluate generalist LLMs and enable red-teaming efforts. Using a commercial LLM and our dataset of synthetic user prompts, we illustrate how our methodology may used to identify responses for further evaluation and to assess whether guardrails are consistently implemented. Finally, we investigate the use of Flan-T5 in detecting LLM responses that offer unvetted medical advice and neglect to instruct users to consult with licensed professionals.