Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

Marius Fracarolli, Michael Staniek, Stefan Riezler
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:1412-1426, 2026.

Abstract

Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting ({TSF}) tasks involving Electronic Health Records ({EHR}). In this study, we explore how data augmentation can mitigate Membership Inference Attacks ({MIA}) on {TSF} models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based {MIA}s by reducing the attacker’s true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model’s ability to generalize to unseen data. We examine multiple augmentation strategies — Zeroth-Order Optimization ({ZOO}), a variant of {ZOO} constrained by Principal Component Analysis ({ZOO-PCA}), and {MixUp} — to strengthen model resilience without sacrificing accuracy. Our experimental results show that {ZOO-PCA} yields the best reductions in {TPR/FPR} ratio for {MIA} attacks without sacrificing performance on test data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-fracarolli26a, title = {Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting}, author = {Fracarolli, Marius and Staniek, Michael and Riezler, Stefan}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {1412--1426}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/fracarolli26a/fracarolli26a.pdf}, url = {https://proceedings.mlr.press/v297/fracarolli26a.html}, abstract = {Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting ({TSF}) tasks involving Electronic Health Records ({EHR}). In this study, we explore how data augmentation can mitigate Membership Inference Attacks ({MIA}) on {TSF} models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based {MIA}s by reducing the attacker’s true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model’s ability to generalize to unseen data. We examine multiple augmentation strategies — Zeroth-Order Optimization ({ZOO}), a variant of {ZOO} constrained by Principal Component Analysis ({ZOO-PCA}), and {MixUp} — to strengthen model resilience without sacrificing accuracy. Our experimental results show that {ZOO-PCA} yields the best reductions in {TPR/FPR} ratio for {MIA} attacks without sacrificing performance on test data.} }
Endnote
%0 Conference Paper %T Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting %A Marius Fracarolli %A Michael Staniek %A Stefan Riezler %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-fracarolli26a %I PMLR %P 1412--1426 %U https://proceedings.mlr.press/v297/fracarolli26a.html %V 297 %X Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting ({TSF}) tasks involving Electronic Health Records ({EHR}). In this study, we explore how data augmentation can mitigate Membership Inference Attacks ({MIA}) on {TSF} models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based {MIA}s by reducing the attacker’s true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model’s ability to generalize to unseen data. We examine multiple augmentation strategies — Zeroth-Order Optimization ({ZOO}), a variant of {ZOO} constrained by Principal Component Analysis ({ZOO-PCA}), and {MixUp} — to strengthen model resilience without sacrificing accuracy. Our experimental results show that {ZOO-PCA} yields the best reductions in {TPR/FPR} ratio for {MIA} attacks without sacrificing performance on test data.
APA
Fracarolli, M., Staniek, M. & Riezler, S.. (2026). Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:1412-1426 Available from https://proceedings.mlr.press/v297/fracarolli26a.html.

Related Material