[edit]
Minimal Data Maximum Impact: Lessons Learned from Real-World Unstructured Data in Paediatric Care
Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 281:70-78, 2025.
Abstract
Digital health records contains significant volume of pertinent, routine information locked within unstructured texts. Current processes requires costly human annotation from a limited number of expert annotators with sufficient domain knowledge and clinician’s time for verification of the outcomes. Our proposed two-stage automated approach enables (1) training and validation of fine-tuned few-shot domain-specific models, firstly to retrieve relevant documents and then performing entity recognition on the retrieved document chunks for identifying correct span of texts based on the use case at hand and, (2) a ”shadow deployment” pipeline testing an end-to-end solution in a pre-production environment. Our shadow deployment pipeline uses Large Language Models (LLMs) as an explainer-in-the-loop and Natural Language Inference (NLI) based verification approach to reduce the dependency on having a clinician to validate the outcomes of the solution. In this paper, we describe the experiments and results of deploying and testing our proposed approach within a real-world paediatric healthcare setting with a focus on histopathology reports of tumours, that can help answer clinical questions in a timely manner.