Instruction-guided deidentification with synthetic test cases for Norwegian clinical text

Jørgen Aarmo Lund; Karl Øyvind Mikalsen; Joel Burman; Ashenafi Zebene Woldaregay; Robert Jenssen

Instruction-guided deidentification with synthetic test cases for Norwegian clinical text

Jørgen Aarmo Lund, Karl Øyvind Mikalsen, Joel Burman, Ashenafi Zebene Woldaregay, Robert Jenssen

Proceedings of the 5th Northern Lights Deep Learning Conference ({NLDL}), PMLR 233:145-152, 2024.

Abstract

Deidentification methods, which remove directly identifying information, can be useful tools to mitigate the privacy risks associated with sharing healthcare data. However, benchmarks to evaluate deidentification methods are themselves often derived from real clinical data, making them sensitive themselves and therefore harder to share and apply. Given the rapid advances in generative language modelling, we would like to leverage large language models to construct freely available deidentification benchmarks, and to assist in the deidentification process. We apply the GPT-4 language model to, for the first time, construct a synthetic and publicly available dataset of synthetic Norwegian discharge summaries with annotated identifying details, consisting of 1200 summaries averaging 100 words each. In our sample of documents, we find that the generated annotations highly agree with human annotations, with an

$F_1$ score of

$0.983$ . We then examine whether large language models can be applied directly to perform deidentification themselves, proposing methods where an instruction-tuned language model is prompted to either annotate or redact identifying details. Comparing the methods on our synthetic dataset and the NorSynthClinical-PHI dataset, we find that GPT-4 underperforms the baseline method proposed by Bråthen et al. (2021), suggesting that named entity recognition problems are still challenging for instruction-tuned language models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v233-lund24a,
  title = 	 {Instruction-guided deidentification with synthetic test cases for Norwegian clinical text},
  author =       {Lund, J{\o}rgen Aarmo and Mikalsen, Karl {\O}yvind and Burman, Joel and Woldaregay, Ashenafi Zebene and Jenssen, Robert},
  booktitle = 	 {Proceedings of the 5th Northern Lights Deep Learning Conference ({NLDL})},
  pages = 	 {145--152},
  year = 	 {2024},
  editor = 	 {Lutchyn, Tetiana and Ramírez Rivera, Adín and Ricaud, Benjamin},
  volume = 	 {233},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jan},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v233/lund24a/lund24a.pdf},
  url = 	 {https://proceedings.mlr.press/v233/lund24a.html},
  abstract = 	 {Deidentification methods, which remove directly identifying information, can be useful tools to mitigate the privacy risks associated with sharing healthcare data. However, benchmarks to evaluate deidentification methods are themselves often derived from real clinical data, making them sensitive themselves and therefore harder to share and apply. Given the rapid advances in generative language modelling, we would like to leverage large language models to construct freely available deidentification benchmarks, and to assist in the deidentification process. We apply the GPT-4 language model to, for the first time, construct a synthetic and publicly available dataset of synthetic Norwegian discharge summaries with annotated identifying details, consisting of 1200 summaries averaging 100 words each. In our sample of documents, we find that the generated annotations highly agree with human annotations, with an $F_1$ score of $0.983$. We then examine whether large language models can be applied directly to perform deidentification themselves, proposing methods where an instruction-tuned language model is prompted to either annotate or redact identifying details. Comparing the methods on our synthetic dataset and the NorSynthClinical-PHI dataset, we find that GPT-4 underperforms the baseline method proposed by Bråthen et al. (2021), suggesting that named entity recognition problems are still challenging for instruction-tuned language models.}
}

Endnote

%0 Conference Paper
%T Instruction-guided deidentification with synthetic test cases for Norwegian clinical text
%A Jørgen Aarmo Lund
%A Karl Øyvind Mikalsen
%A Joel Burman
%A Ashenafi Zebene Woldaregay
%A Robert Jenssen
%B Proceedings of the 5th Northern Lights Deep Learning Conference ({NLDL})
%C Proceedings of Machine Learning Research
%D 2024
%E Tetiana Lutchyn
%E Adín Ramírez Rivera
%E Benjamin Ricaud	
%F pmlr-v233-lund24a
%I PMLR
%P 145--152
%U https://proceedings.mlr.press/v233/lund24a.html
%V 233
%X Deidentification methods, which remove directly identifying information, can be useful tools to mitigate the privacy risks associated with sharing healthcare data. However, benchmarks to evaluate deidentification methods are themselves often derived from real clinical data, making them sensitive themselves and therefore harder to share and apply. Given the rapid advances in generative language modelling, we would like to leverage large language models to construct freely available deidentification benchmarks, and to assist in the deidentification process. We apply the GPT-4 language model to, for the first time, construct a synthetic and publicly available dataset of synthetic Norwegian discharge summaries with annotated identifying details, consisting of 1200 summaries averaging 100 words each. In our sample of documents, we find that the generated annotations highly agree with human annotations, with an $F_1$ score of $0.983$. We then examine whether large language models can be applied directly to perform deidentification themselves, proposing methods where an instruction-tuned language model is prompted to either annotate or redact identifying details. Comparing the methods on our synthetic dataset and the NorSynthClinical-PHI dataset, we find that GPT-4 underperforms the baseline method proposed by Bråthen et al. (2021), suggesting that named entity recognition problems are still challenging for instruction-tuned language models.

APA


Lund, J.A., Mikalsen, K.Ø., Burman, J., Woldaregay, A.Z. & Jenssen, R.. (2024). Instruction-guided deidentification with synthetic test cases for Norwegian clinical text. Proceedings of the 5th Northern Lights Deep Learning Conference ({NLDL}), in Proceedings of Machine Learning Research 233:145-152 Available from https://proceedings.mlr.press/v233/lund24a.html.

Instruction-guided deidentification with synthetic test cases for Norwegian clinical text

Abstract

Cite this Paper

Related Material