UPSTAGE: Unsupervised Context Augmentation for Utterance Classification in Patient-Provider Communication
Proceedings of the 5th Machine Learning for Healthcare Conference, PMLR 126:895-912, 2020.
Conversations between patients and providers in clinical settings provide a source of natural language data that may reflect and correlate with the patients’ experience and response to the treatment they are receiving. When analyzing utterances in such conversations, it is not sufficient to consider each sentence in isolation, since its context may play a role in determining its semantic meaning. Recently, contextual information in natural language documents has been modeled using various techniques, such as recurrent neural networks with latent variables, or neural networks with attention mechanisms. In this paper, we present UnsuPerviSed conText AuGmEntation (Upstage), a classification framework that relies on both local and global contextual information from different sources. Upstage uses transformer models with pretrained language models and joint sentence representation to solve the task of classifying health topics in patient-provider conversations. In addition, Upstage leverages unlabeled corpora for pretraining and data augmentation to provide additional context, which leads to improved classification performance.