Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture

Seongsu Bae; Daeyoung Kim; Jiho Kim; Edward Choi

Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture

Seongsu Bae, Daeyoung Kim, Jiho Kim, Edward Choi

Proceedings of Machine Learning for Health, PMLR 158:13-25, 2021.

Abstract

An intelligent machine that can answer human questions based on electronic health records (EHR-QA) has a great practical value, such as supporting clinical decisions, managing hospital administration, and medical chatbots. Previous table-based QA studies focusing on translating natural questions into table queries (NLQ2SQL), however, suffer from the unique nature of EHR data due to complex and specialized medical terminology, hence increased decoding difficulty. In this paper, we design UniQA, a unified encoder-decoder architecture for EHR-QA where natural language questions are converted to queries such as SQL or SPARQL. We also propose input masking (IM), a simple and effective method to cope with complex medical terms and various typos and better learn the SQL/SPARQL syntax. Combining the unified architecture with an effective auxiliary training objective, UniQA demonstrated a significant performance improvement against the previous state-of-the-art model for MIMICSQL* (14.2% gain), the most complex NLQ2SQL dataset in the EHR domain, and its typo-ridden versions ( 28.8% gain). In addition, we confirmed consistent results for the graph-based EHR-QA dataset, MIMICSPARQL*.

Cite this Paper

BibTeX

@InProceedings{pmlr-v158-bae21a,
  title = 	 {Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture},
  author =       {Bae, Seongsu and Kim, Daeyoung and Kim, Jiho and Choi, Edward},
  booktitle = 	 {Proceedings of Machine Learning for Health},
  pages = 	 {13--25},
  year = 	 {2021},
  editor = 	 {Roy, Subhrajit and Pfohl, Stephen and Rocheteau, Emma and Tadesse, Girmaw Abebe and Oala, Luis and Falck, Fabian and Zhou, Yuyin and Shen, Liyue and Zamzmi, Ghada and Mugambi, Purity and Zirikly, Ayah and McDermott, Matthew B. A. and Alsentzer, Emily},
  volume = 	 {158},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {04 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v158/bae21a/bae21a.pdf},
  url = 	 {https://proceedings.mlr.press/v158/bae21a.html},
  abstract = 	 {An intelligent machine that can answer human questions based on electronic health records (EHR-QA) has a great practical value, such as supporting clinical decisions, managing hospital administration, and medical chatbots. Previous table-based QA studies focusing on translating natural questions into table queries (NLQ2SQL), however, suffer from the unique nature of EHR data due to complex and specialized medical terminology, hence increased decoding difficulty. In this paper, we design UniQA, a unified encoder-decoder architecture for EHR-QA where natural language questions are converted to queries such as SQL or SPARQL. We also propose input masking (IM), a simple and effective method to cope with complex medical terms and various typos and better learn the SQL/SPARQL syntax. Combining the unified architecture with an effective auxiliary training objective, UniQA demonstrated a significant performance improvement against the previous state-of-the-art model for MIMICSQL* (14.2% gain), the most complex NLQ2SQL dataset in the EHR domain, and its typo-ridden versions ( 28.8%  gain). In addition, we confirmed consistent results for the graph-based EHR-QA dataset, MIMICSPARQL*.}
}

Endnote

%0 Conference Paper
%T Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture
%A Seongsu Bae
%A Daeyoung Kim
%A Jiho Kim
%A Edward Choi
%B Proceedings of Machine Learning for Health
%C Proceedings of Machine Learning Research
%D 2021
%E Subhrajit Roy
%E Stephen Pfohl
%E Emma Rocheteau
%E Girmaw Abebe Tadesse
%E Luis Oala
%E Fabian Falck
%E Yuyin Zhou
%E Liyue Shen
%E Ghada Zamzmi
%E Purity Mugambi
%E Ayah Zirikly
%E Matthew B. A. McDermott
%E Emily Alsentzer	
%F pmlr-v158-bae21a
%I PMLR
%P 13--25
%U https://proceedings.mlr.press/v158/bae21a.html
%V 158
%X An intelligent machine that can answer human questions based on electronic health records (EHR-QA) has a great practical value, such as supporting clinical decisions, managing hospital administration, and medical chatbots. Previous table-based QA studies focusing on translating natural questions into table queries (NLQ2SQL), however, suffer from the unique nature of EHR data due to complex and specialized medical terminology, hence increased decoding difficulty. In this paper, we design UniQA, a unified encoder-decoder architecture for EHR-QA where natural language questions are converted to queries such as SQL or SPARQL. We also propose input masking (IM), a simple and effective method to cope with complex medical terms and various typos and better learn the SQL/SPARQL syntax. Combining the unified architecture with an effective auxiliary training objective, UniQA demonstrated a significant performance improvement against the previous state-of-the-art model for MIMICSQL* (14.2% gain), the most complex NLQ2SQL dataset in the EHR domain, and its typo-ridden versions ( 28.8%  gain). In addition, we confirmed consistent results for the graph-based EHR-QA dataset, MIMICSPARQL*.

APA

Bae, S., Kim, D., Kim, J. & Choi, E.. (2021). Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture. Proceedings of Machine Learning for Health, in Proceedings of Machine Learning Research 158:13-25 Available from https://proceedings.mlr.press/v158/bae21a.html.

Related Material

Download PDF