Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

Yuqing Wang, Yun Zhao, Linda Petzold
Proceedings of the 8th Machine Learning for Healthcare Conference, PMLR 219:804-823, 2023.

Abstract

Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance the performance of LLMs by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation highlights the importance of employing task-specific learning strategies and prompting techniques, such as SQP, to maximize the effectiveness of LLMs in healthcare-related tasks. Our study emphasizes the need for cautious implementation of LLMs in healthcare settings, ensuring a collaborative approach with domain experts and continuous verification by human experts to achieve responsible and effective use, ultimately contributing to improved patient care. Our code is available at https://github.com/EternityYW/LLM_healthcare.

Cite this Paper


BibTeX
@InProceedings{pmlr-v219-wang23c, title = {Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding}, author = {Wang, Yuqing and Zhao, Yun and Petzold, Linda}, booktitle = {Proceedings of the 8th Machine Learning for Healthcare Conference}, pages = {804--823}, year = {2023}, editor = {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo and Yeung, Serene}, volume = {219}, series = {Proceedings of Machine Learning Research}, month = {11--12 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v219/wang23c/wang23c.pdf}, url = {https://proceedings.mlr.press/v219/wang23c.html}, abstract = {Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance the performance of LLMs by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation highlights the importance of employing task-specific learning strategies and prompting techniques, such as SQP, to maximize the effectiveness of LLMs in healthcare-related tasks. Our study emphasizes the need for cautious implementation of LLMs in healthcare settings, ensuring a collaborative approach with domain experts and continuous verification by human experts to achieve responsible and effective use, ultimately contributing to improved patient care. Our code is available at https://github.com/EternityYW/LLM_healthcare.} }
Endnote
%0 Conference Paper %T Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding %A Yuqing Wang %A Yun Zhao %A Linda Petzold %B Proceedings of the 8th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2023 %E Kaivalya Deshpande %E Madalina Fiterau %E Shalmali Joshi %E Zachary Lipton %E Rajesh Ranganath %E Iñigo Urteaga %E Serene Yeung %F pmlr-v219-wang23c %I PMLR %P 804--823 %U https://proceedings.mlr.press/v219/wang23c.html %V 219 %X Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance the performance of LLMs by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation highlights the importance of employing task-specific learning strategies and prompting techniques, such as SQP, to maximize the effectiveness of LLMs in healthcare-related tasks. Our study emphasizes the need for cautious implementation of LLMs in healthcare settings, ensuring a collaborative approach with domain experts and continuous verification by human experts to achieve responsible and effective use, ultimately contributing to improved patient care. Our code is available at https://github.com/EternityYW/LLM_healthcare.
APA
Wang, Y., Zhao, Y. & Petzold, L.. (2023). Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding. Proceedings of the 8th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 219:804-823 Available from https://proceedings.mlr.press/v219/wang23c.html.

Related Material