[edit]
SleepQA: A Health Coaching Dataset on Sleep for Extractive Question Answering
Proceedings of the 2nd Machine Learning for Health symposium, PMLR 193:199-217, 2022.
Abstract
Question Answering (QA) systems can support health coaches in facilitating clients’ lifestyle behavior changes (e.g., in adopting healthy sleep habits). In this paper, we design a domain-specific QA pipeline for sleep coaching. To this end, we release SleepQA, a dataset created from 7,005 passages comprising 4,250 training examples with single annotations and 750 examples with 5-way annotations. We fine-tuned different domain-specific BERT models on our dataset and perform extensive automatic and human evaluation of the resulting end-to-end QA pipeline. Comparisons of our pipeline with baseline show improvements in domain-specific natural language processing on real-world questions. We hope that this dataset will lead to wider research interest in this important health domain.