Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records

Daeyoung Kim, Seongsu Bae, Seungho Kim, Edward Choi
Proceedings of the Conference on Health, Inference, and Learning, PMLR 174:138-151, 2022.

Abstract

Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.

Cite this Paper


BibTeX
@InProceedings{pmlr-v174-kim22a, title = {Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records}, author = {Kim, Daeyoung and Bae, Seongsu and Kim, Seungho and Choi, Edward}, booktitle = {Proceedings of the Conference on Health, Inference, and Learning}, pages = {138--151}, year = {2022}, editor = {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan}, volume = {174}, series = {Proceedings of Machine Learning Research}, month = {07--08 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v174/kim22a/kim22a.pdf}, url = {https://proceedings.mlr.press/v174/kim22a.html}, abstract = {Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.} }
Endnote
%0 Conference Paper %T Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records %A Daeyoung Kim %A Seongsu Bae %A Seungho Kim %A Edward Choi %B Proceedings of the Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2022 %E Gerardo Flores %E George H Chen %E Tom Pollard %E Joyce C Ho %E Tristan Naumann %F pmlr-v174-kim22a %I PMLR %P 138--151 %U https://proceedings.mlr.press/v174/kim22a.html %V 174 %X Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.
APA
Kim, D., Bae, S., Kim, S. & Choi, E.. (2022). Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records. Proceedings of the Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 174:138-151 Available from https://proceedings.mlr.press/v174/kim22a.html.

Related Material