PrivECG: generating private ECG for end-to-end anonymization

Alexis Nolin-Lapalme, Robert Avram, Hussin Julie
Proceedings of the 8th Machine Learning for Healthcare Conference, PMLR 219:509-528, 2023.

Abstract

The electrocardiogram (ECG) remains the cornerstone of diagnosis in cardiology where, pathologies uniquely impact its appearance, permitting the identification of underlying electrical or structural abnormalities. Notably, multiple deep learning approaches have demonstrated that disease prediction could be performed with high accuracy using ECG waveforms. However, this signal-rich modality has also demonstrated the potential to be predictive of a patient’s private attributes such as biological sex and age. More importantly, recent research has demonstrated that many medical data modalities could allow patient re-identification with only the modality of interest despite anonymization through current paradigms, raising important privacy concerns. In this paper, we propose a novel approach to anonymize the ECG waveforms themselves while maximizing the privacy-utility trade-off. We describe PrivECG1, a generative adversarial network (GAN) framework capable of privatizing 12-lead ECGs while conserving their disease-descriptive features. PrivECG significantly decreases patient validation performances by targeting sex-linked features. Our approach reduces sex prediction accuracy from 0.876 to near-random 0.529, by permitting greater variability of the ECG’s R-wave morphology, as well as bringing the equal error rate (EER) from 0.098 to 0.251 on individual validation tasks. Moreover, the regenerated ECGs maintain a majority of their disease-predicting potential, with an F1 score of 0.885 from the baseline’s 0.931 on a multilabel disease prediction task. We further demonstrate that reintroducing sex-linked information downstream in the network allows recuperating performances with an F1 score of 0.893 proving our loss of performance is due to the privatization of the sex-linked features, as well as serves as a disambiguation tool to evaluate the impact of sex information on prediction performances. Our results suggest that our approach could allow improved anonymization of a large ECG database in minutes without strongly impacting downstream clinically-relevant tasks in a task-independent manner.

Cite this Paper


BibTeX
@InProceedings{pmlr-v219-nolin-lapalme23a, title = {PrivECG: generating private ECG for end-to-end anonymization}, author = {Nolin-Lapalme, Alexis and Avram, Robert and Julie, Hussin}, booktitle = {Proceedings of the 8th Machine Learning for Healthcare Conference}, pages = {509--528}, year = {2023}, editor = {Deshpande, Kaivalya and Fiterau, Madalina and Joshi, Shalmali and Lipton, Zachary and Ranganath, Rajesh and Urteaga, Iñigo and Yeung, Serene}, volume = {219}, series = {Proceedings of Machine Learning Research}, month = {11--12 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v219/nolin-lapalme23a/nolin-lapalme23a.pdf}, url = {https://proceedings.mlr.press/v219/nolin-lapalme23a.html}, abstract = {The electrocardiogram (ECG) remains the cornerstone of diagnosis in cardiology where, pathologies uniquely impact its appearance, permitting the identification of underlying electrical or structural abnormalities. Notably, multiple deep learning approaches have demonstrated that disease prediction could be performed with high accuracy using ECG waveforms. However, this signal-rich modality has also demonstrated the potential to be predictive of a patient’s private attributes such as biological sex and age. More importantly, recent research has demonstrated that many medical data modalities could allow patient re-identification with only the modality of interest despite anonymization through current paradigms, raising important privacy concerns. In this paper, we propose a novel approach to anonymize the ECG waveforms themselves while maximizing the privacy-utility trade-off. We describe PrivECG1, a generative adversarial network (GAN) framework capable of privatizing 12-lead ECGs while conserving their disease-descriptive features. PrivECG significantly decreases patient validation performances by targeting sex-linked features. Our approach reduces sex prediction accuracy from 0.876 to near-random 0.529, by permitting greater variability of the ECG’s R-wave morphology, as well as bringing the equal error rate (EER) from 0.098 to 0.251 on individual validation tasks. Moreover, the regenerated ECGs maintain a majority of their disease-predicting potential, with an F1 score of 0.885 from the baseline’s 0.931 on a multilabel disease prediction task. We further demonstrate that reintroducing sex-linked information downstream in the network allows recuperating performances with an F1 score of 0.893 proving our loss of performance is due to the privatization of the sex-linked features, as well as serves as a disambiguation tool to evaluate the impact of sex information on prediction performances. Our results suggest that our approach could allow improved anonymization of a large ECG database in minutes without strongly impacting downstream clinically-relevant tasks in a task-independent manner.} }
Endnote
%0 Conference Paper %T PrivECG: generating private ECG for end-to-end anonymization %A Alexis Nolin-Lapalme %A Robert Avram %A Hussin Julie %B Proceedings of the 8th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2023 %E Kaivalya Deshpande %E Madalina Fiterau %E Shalmali Joshi %E Zachary Lipton %E Rajesh Ranganath %E Iñigo Urteaga %E Serene Yeung %F pmlr-v219-nolin-lapalme23a %I PMLR %P 509--528 %U https://proceedings.mlr.press/v219/nolin-lapalme23a.html %V 219 %X The electrocardiogram (ECG) remains the cornerstone of diagnosis in cardiology where, pathologies uniquely impact its appearance, permitting the identification of underlying electrical or structural abnormalities. Notably, multiple deep learning approaches have demonstrated that disease prediction could be performed with high accuracy using ECG waveforms. However, this signal-rich modality has also demonstrated the potential to be predictive of a patient’s private attributes such as biological sex and age. More importantly, recent research has demonstrated that many medical data modalities could allow patient re-identification with only the modality of interest despite anonymization through current paradigms, raising important privacy concerns. In this paper, we propose a novel approach to anonymize the ECG waveforms themselves while maximizing the privacy-utility trade-off. We describe PrivECG1, a generative adversarial network (GAN) framework capable of privatizing 12-lead ECGs while conserving their disease-descriptive features. PrivECG significantly decreases patient validation performances by targeting sex-linked features. Our approach reduces sex prediction accuracy from 0.876 to near-random 0.529, by permitting greater variability of the ECG’s R-wave morphology, as well as bringing the equal error rate (EER) from 0.098 to 0.251 on individual validation tasks. Moreover, the regenerated ECGs maintain a majority of their disease-predicting potential, with an F1 score of 0.885 from the baseline’s 0.931 on a multilabel disease prediction task. We further demonstrate that reintroducing sex-linked information downstream in the network allows recuperating performances with an F1 score of 0.893 proving our loss of performance is due to the privatization of the sex-linked features, as well as serves as a disambiguation tool to evaluate the impact of sex information on prediction performances. Our results suggest that our approach could allow improved anonymization of a large ECG database in minutes without strongly impacting downstream clinically-relevant tasks in a task-independent manner.
APA
Nolin-Lapalme, A., Avram, R. & Julie, H.. (2023). PrivECG: generating private ECG for end-to-end anonymization. Proceedings of the 8th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 219:509-528 Available from https://proceedings.mlr.press/v219/nolin-lapalme23a.html.

Related Material