[edit]
Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Proceedings of the 8th Machine Learning for Healthcare Conference, PMLR 219:804-823, 2023.
Abstract
Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance the performance of LLMs by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation highlights the importance of employing task-specific learning strategies and prompting techniques, such as SQP, to maximize the effectiveness of LLMs in healthcare-related tasks. Our study emphasizes the need for cautious implementation of LLMs in healthcare settings, ensuring a collaborative approach with domain experts and continuous verification by human experts to achieve responsible and effective use, ultimately contributing to improved patient care. Our code is available at https://github.com/EternityYW/LLM_healthcare.