Minimal Data Maximum Impact: Lessons Learned from Real-World Unstructured Data in Paediatric Care

Jaskaran Singh Kawatra; Sebin Sabu; Pavithra Rajendran; Caroline Baumgartner; Avish Vijayaraghavan; Ewart Jonny Sheldon; John Booth; Neil Sebire; Shiren Patel; Alexandros Zenonos; Rebecca Pope

Minimal Data Maximum Impact: Lessons Learned from Real-World Unstructured Data in Paediatric Care

Jaskaran Singh Kawatra, Sebin Sabu, Pavithra Rajendran, Caroline Baumgartner, Avish Vijayaraghavan, Ewart Jonny Sheldon, John Booth, Neil Sebire, Shiren Patel, Alexandros Zenonos, Rebecca Pope

Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 281:70-78, 2025.

Abstract

Digital health records contains significant volume of pertinent, routine information locked within unstructured texts. Current processes requires costly human annotation from a limited number of expert annotators with sufficient domain knowledge and clinician’s time for verification of the outcomes. Our proposed two-stage automated approach enables (1) training and validation of fine-tuned few-shot domain-specific models, firstly to retrieve relevant documents and then performing entity recognition on the retrieved document chunks for identifying correct span of texts based on the use case at hand and, (2) a ”shadow deployment” pipeline testing an end-to-end solution in a pre-production environment. Our shadow deployment pipeline uses Large Language Models (LLMs) as an explainer-in-the-loop and Natural Language Inference (NLI) based verification approach to reduce the dependency on having a clinician to validate the outcomes of the solution. In this paper, we describe the experiments and results of deploying and testing our proposed approach within a real-world paediatric healthcare setting with a focus on histopathology reports of tumours, that can help answer clinical questions in a timely manner.

Cite this Paper

BibTeX

@InProceedings{pmlr-v281-kawatra25a,
  title = 	 {Minimal Data Maximum Impact: Lessons Learned from Real-World Unstructured Data in Paediatric Care},
  author =       {Kawatra, Jaskaran Singh and Sabu, Sebin and Rajendran, Pavithra and Baumgartner, Caroline and Vijayaraghavan, Avish and Sheldon, Ewart Jonny and Booth, John and Sebire, Neil and Patel, Shiren and Zenonos, Alexandros and Pope, Rebecca},
  booktitle = 	 {Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare},
  pages = 	 {70--78},
  year = 	 {2025},
  editor = 	 {Wu, Junde and Zhu, Jiayuan and Xu, Min and Jin, Yueming},
  volume = 	 {281},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v281/main/assets/kawatra25a/kawatra25a.pdf},
  url = 	 {https://proceedings.mlr.press/v281/kawatra25a.html},
  abstract = 	 {Digital health records contains significant volume of pertinent, routine information locked within unstructured texts. Current processes requires costly human annotation from a limited number of expert annotators with sufficient domain knowledge and clinician’s time for verification of the outcomes. Our proposed two-stage automated approach enables (1) training and validation of fine-tuned few-shot domain-specific models, firstly to retrieve relevant documents and then performing entity recognition on the retrieved document chunks for identifying correct span of texts based on the use case at hand and, (2) a ”shadow deployment” pipeline testing an end-to-end solution in a pre-production environment. Our shadow deployment pipeline uses Large Language Models (LLMs) as an explainer-in-the-loop and Natural Language Inference (NLI) based verification approach to reduce the dependency on having a clinician to validate the outcomes of the solution. In this paper, we describe the experiments and results of deploying and testing our proposed approach within a real-world paediatric healthcare setting with a focus on histopathology reports of tumours, that can help answer clinical questions in a timely manner.}
}

Endnote

%0 Conference Paper
%T Minimal Data Maximum Impact: Lessons Learned from Real-World Unstructured Data in Paediatric Care
%A Jaskaran Singh Kawatra
%A Sebin Sabu
%A Pavithra Rajendran
%A Caroline Baumgartner
%A Avish Vijayaraghavan
%A Ewart Jonny Sheldon
%A John Booth
%A Neil Sebire
%A Shiren Patel
%A Alexandros Zenonos
%A Rebecca Pope
%B Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare
%C Proceedings of Machine Learning Research
%D 2025
%E Junde Wu
%E Jiayuan Zhu
%E Min Xu
%E Yueming Jin	
%F pmlr-v281-kawatra25a
%I PMLR
%P 70--78
%U https://proceedings.mlr.press/v281/kawatra25a.html
%V 281
%X Digital health records contains significant volume of pertinent, routine information locked within unstructured texts. Current processes requires costly human annotation from a limited number of expert annotators with sufficient domain knowledge and clinician’s time for verification of the outcomes. Our proposed two-stage automated approach enables (1) training and validation of fine-tuned few-shot domain-specific models, firstly to retrieve relevant documents and then performing entity recognition on the retrieved document chunks for identifying correct span of texts based on the use case at hand and, (2) a ”shadow deployment” pipeline testing an end-to-end solution in a pre-production environment. Our shadow deployment pipeline uses Large Language Models (LLMs) as an explainer-in-the-loop and Natural Language Inference (NLI) based verification approach to reduce the dependency on having a clinician to validate the outcomes of the solution. In this paper, we describe the experiments and results of deploying and testing our proposed approach within a real-world paediatric healthcare setting with a focus on histopathology reports of tumours, that can help answer clinical questions in a timely manner.

APA

Kawatra, J.S., Sabu, S., Rajendran, P., Baumgartner, C., Vijayaraghavan, A., Sheldon, E.J., Booth, J., Sebire, N., Patel, S., Zenonos, A. & Pope, R.. (2025). Minimal Data Maximum Impact: Lessons Learned from Real-World Unstructured Data in Paediatric Care. Proceedings of The First AAAI Bridge Program on AI for Medicine and Healthcare, in Proceedings of Machine Learning Research 281:70-78 Available from https://proceedings.mlr.press/v281/kawatra25a.html.

Related Material

Download PDF