SleepQA: A Health Coaching Dataset on Sleep for Extractive Question Answering

Iva Bojic, Qi Chwen Ong, Megh Thakkar, Esha Kamran, Irving Yu Le Shua, Jaime Rei Ern Pang, Jessica Chen, Vaaruni Nayak, Shafiq Joty, Josip Car
Proceedings of the 2nd Machine Learning for Health symposium, PMLR 193:199-217, 2022.

Abstract

Question Answering (QA) systems can support health coaches in facilitating clients’ lifestyle behavior changes (e.g., in adopting healthy sleep habits). In this paper, we design a domain-specific QA pipeline for sleep coaching. To this end, we release SleepQA, a dataset created from 7,005 passages comprising 4,250 training examples with single annotations and 750 examples with 5-way annotations. We fine-tuned different domain-specific BERT models on our dataset and perform extensive automatic and human evaluation of the resulting end-to-end QA pipeline. Comparisons of our pipeline with baseline show improvements in domain-specific natural language processing on real-world questions. We hope that this dataset will lead to wider research interest in this important health domain.

Cite this Paper


BibTeX
@InProceedings{pmlr-v193-bojic22a, title = {SleepQA: A Health Coaching Dataset on Sleep for Extractive Question Answering}, author = {Bojic, Iva and Ong, Qi Chwen and Thakkar, Megh and Kamran, Esha and Shua, Irving Yu Le and Pang, Jaime Rei Ern and Chen, Jessica and Nayak, Vaaruni and Joty, Shafiq and Car, Josip}, booktitle = {Proceedings of the 2nd Machine Learning for Health symposium}, pages = {199--217}, year = {2022}, editor = {Parziale, Antonio and Agrawal, Monica and Joshi, Shalmali and Chen, Irene Y. and Tang, Shengpu and Oala, Luis and Subbaswamy, Adarsh}, volume = {193}, series = {Proceedings of Machine Learning Research}, month = {28 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v193/bojic22a/bojic22a.pdf}, url = {https://proceedings.mlr.press/v193/bojic22a.html}, abstract = {Question Answering (QA) systems can support health coaches in facilitating clients’ lifestyle behavior changes (e.g., in adopting healthy sleep habits). In this paper, we design a domain-specific QA pipeline for sleep coaching. To this end, we release SleepQA, a dataset created from 7,005 passages comprising 4,250 training examples with single annotations and 750 examples with 5-way annotations. We fine-tuned different domain-specific BERT models on our dataset and perform extensive automatic and human evaluation of the resulting end-to-end QA pipeline. Comparisons of our pipeline with baseline show improvements in domain-specific natural language processing on real-world questions. We hope that this dataset will lead to wider research interest in this important health domain.} }
Endnote
%0 Conference Paper %T SleepQA: A Health Coaching Dataset on Sleep for Extractive Question Answering %A Iva Bojic %A Qi Chwen Ong %A Megh Thakkar %A Esha Kamran %A Irving Yu Le Shua %A Jaime Rei Ern Pang %A Jessica Chen %A Vaaruni Nayak %A Shafiq Joty %A Josip Car %B Proceedings of the 2nd Machine Learning for Health symposium %C Proceedings of Machine Learning Research %D 2022 %E Antonio Parziale %E Monica Agrawal %E Shalmali Joshi %E Irene Y. Chen %E Shengpu Tang %E Luis Oala %E Adarsh Subbaswamy %F pmlr-v193-bojic22a %I PMLR %P 199--217 %U https://proceedings.mlr.press/v193/bojic22a.html %V 193 %X Question Answering (QA) systems can support health coaches in facilitating clients’ lifestyle behavior changes (e.g., in adopting healthy sleep habits). In this paper, we design a domain-specific QA pipeline for sleep coaching. To this end, we release SleepQA, a dataset created from 7,005 passages comprising 4,250 training examples with single annotations and 750 examples with 5-way annotations. We fine-tuned different domain-specific BERT models on our dataset and perform extensive automatic and human evaluation of the resulting end-to-end QA pipeline. Comparisons of our pipeline with baseline show improvements in domain-specific natural language processing on real-world questions. We hope that this dataset will lead to wider research interest in this important health domain.
APA
Bojic, I., Ong, Q.C., Thakkar, M., Kamran, E., Shua, I.Y.L., Pang, J.R.E., Chen, J., Nayak, V., Joty, S. & Car, J.. (2022). SleepQA: A Health Coaching Dataset on Sleep for Extractive Question Answering. Proceedings of the 2nd Machine Learning for Health symposium, in Proceedings of Machine Learning Research 193:199-217 Available from https://proceedings.mlr.press/v193/bojic22a.html.

Related Material