Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection

Bhawesh Kumar, Jonathan Amar, Eric Yang, Nan Li, Yugang jia
Proceedings of the 9th Machine Learning for Healthcare Conference, PMLR 252, 2024.

Abstract

Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task specific expert-annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from Gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We find that the fine-tuned PaLM-2 with filtered labels outperforms Gemini Pro 1.0 and other LLMs on this task and achieves performance close to PaLM-2 fine-tuned on non-expert human annotations. Our results show that leveraging LLM-generated labels, coupled with strategic filtering can be a viable and cost-effective strategy for improving LLM performance on specialized tasks, especially in domains where expert annotations are scarce, expensive, or time-consuming to obtain.

Cite this Paper


BibTeX
@InProceedings{pmlr-v252-kumar24a, title = {Selective Fine-tuning on {LLM}-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection}, author = {Kumar, Bhawesh and Amar, Jonathan and Yang, Eric and Li, Nan and jia, Yugang}, booktitle = {Proceedings of the 9th Machine Learning for Healthcare Conference}, year = {2024}, editor = {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo}, volume = {252}, series = {Proceedings of Machine Learning Research}, month = {16--17 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v252/main/assets/kumar24a/kumar24a.pdf}, url = {https://proceedings.mlr.press/v252/kumar24a.html}, abstract = {Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task specific expert-annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from Gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We find that the fine-tuned PaLM-2 with filtered labels outperforms Gemini Pro 1.0 and other LLMs on this task and achieves performance close to PaLM-2 fine-tuned on non-expert human annotations. Our results show that leveraging LLM-generated labels, coupled with strategic filtering can be a viable and cost-effective strategy for improving LLM performance on specialized tasks, especially in domains where expert annotations are scarce, expensive, or time-consuming to obtain.} }
Endnote
%0 Conference Paper %T Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection %A Bhawesh Kumar %A Jonathan Amar %A Eric Yang %A Nan Li %A Yugang jia %B Proceedings of the 9th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2024 %E Kaivalya Deshpande %E Madalina Fiterau %E Shalmali Joshi %E Zachary Lipton %E Rajesh Ranganath %E Iñigo Urteaga %F pmlr-v252-kumar24a %I PMLR %U https://proceedings.mlr.press/v252/kumar24a.html %V 252 %X Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task specific expert-annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from Gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We find that the fine-tuned PaLM-2 with filtered labels outperforms Gemini Pro 1.0 and other LLMs on this task and achieves performance close to PaLM-2 fine-tuned on non-expert human annotations. Our results show that leveraging LLM-generated labels, coupled with strategic filtering can be a viable and cost-effective strategy for improving LLM performance on specialized tasks, especially in domains where expert annotations are scarce, expensive, or time-consuming to obtain.
APA
Kumar, B., Amar, J., Yang, E., Li, N. & jia, Y.. (2024). Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection. Proceedings of the 9th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 252 Available from https://proceedings.mlr.press/v252/kumar24a.html.

Related Material