Data Augmentation for Electrocardiograms

Aniruddh Raghu, Divya Shanmugam, Eugene Pomerantsev, John Guttag, Collin M Stultz
Proceedings of the Conference on Health, Inference, and Learning, PMLR 174:282-310, 2022.

Abstract

Neural network models have demonstrated impressive performance in predicting pathologies and outcomes from the 12-lead electrocardiogram (ECG). However, these models often need to be trained with large, labelled datasets, which are not available for many predictive tasks of interest. In this work, we perform an empirical study examining whether training time data augmentation methods can be used to improve performance on such data-scarce ECG prediction problems. We investigate how data augmentation strategies impact model performance when detecting cardiac abnormalities from the ECG. Motivated by our finding that the effectiveness of existing augmentation strategies is highly task-dependent, we introduce a new method, \textit{TaskAug}, which defines a flexible augmentation policy that is optimized on a per-task basis. We outline an efficient learning algorithm to do so that leverages recent work in nested optimization and implicit differentiation. In experiments, considering three datasets and eight predictive tasks, we find that TaskAug is competitive with or improves on prior work, and the learned policies shed light on what transformations are most effective for different tasks. We distill key insights from our experimental evaluation, generating a set of best practices for applying data augmentation to ECG prediction problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v174-raghu22a, title = {Data Augmentation for Electrocardiograms}, author = {Raghu, Aniruddh and Shanmugam, Divya and Pomerantsev, Eugene and Guttag, John and Stultz, Collin M}, booktitle = {Proceedings of the Conference on Health, Inference, and Learning}, pages = {282--310}, year = {2022}, editor = {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan}, volume = {174}, series = {Proceedings of Machine Learning Research}, month = {07--08 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v174/raghu22a/raghu22a.pdf}, url = {https://proceedings.mlr.press/v174/raghu22a.html}, abstract = {Neural network models have demonstrated impressive performance in predicting pathologies and outcomes from the 12-lead electrocardiogram (ECG). However, these models often need to be trained with large, labelled datasets, which are not available for many predictive tasks of interest. In this work, we perform an empirical study examining whether training time data augmentation methods can be used to improve performance on such data-scarce ECG prediction problems. We investigate how data augmentation strategies impact model performance when detecting cardiac abnormalities from the ECG. Motivated by our finding that the effectiveness of existing augmentation strategies is highly task-dependent, we introduce a new method, \textit{TaskAug}, which defines a flexible augmentation policy that is optimized on a per-task basis. We outline an efficient learning algorithm to do so that leverages recent work in nested optimization and implicit differentiation. In experiments, considering three datasets and eight predictive tasks, we find that TaskAug is competitive with or improves on prior work, and the learned policies shed light on what transformations are most effective for different tasks. We distill key insights from our experimental evaluation, generating a set of best practices for applying data augmentation to ECG prediction problems.} }
Endnote
%0 Conference Paper %T Data Augmentation for Electrocardiograms %A Aniruddh Raghu %A Divya Shanmugam %A Eugene Pomerantsev %A John Guttag %A Collin M Stultz %B Proceedings of the Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2022 %E Gerardo Flores %E George H Chen %E Tom Pollard %E Joyce C Ho %E Tristan Naumann %F pmlr-v174-raghu22a %I PMLR %P 282--310 %U https://proceedings.mlr.press/v174/raghu22a.html %V 174 %X Neural network models have demonstrated impressive performance in predicting pathologies and outcomes from the 12-lead electrocardiogram (ECG). However, these models often need to be trained with large, labelled datasets, which are not available for many predictive tasks of interest. In this work, we perform an empirical study examining whether training time data augmentation methods can be used to improve performance on such data-scarce ECG prediction problems. We investigate how data augmentation strategies impact model performance when detecting cardiac abnormalities from the ECG. Motivated by our finding that the effectiveness of existing augmentation strategies is highly task-dependent, we introduce a new method, \textit{TaskAug}, which defines a flexible augmentation policy that is optimized on a per-task basis. We outline an efficient learning algorithm to do so that leverages recent work in nested optimization and implicit differentiation. In experiments, considering three datasets and eight predictive tasks, we find that TaskAug is competitive with or improves on prior work, and the learned policies shed light on what transformations are most effective for different tasks. We distill key insights from our experimental evaluation, generating a set of best practices for applying data augmentation to ECG prediction problems.
APA
Raghu, A., Shanmugam, D., Pomerantsev, E., Guttag, J. & Stultz, C.M.. (2022). Data Augmentation for Electrocardiograms. Proceedings of the Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 174:282-310 Available from https://proceedings.mlr.press/v174/raghu22a.html.

Related Material