Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data

Yvonne Zhou, Mingyu Liang, Ivan Brugere, Danial Dervovic, Antigoni Polychroniadou, Min Wu, Dana Dachman-Soled
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:61979-62001, 2024.

Abstract

The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially- private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhou24k, title = {Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data}, author = {Zhou, Yvonne and Liang, Mingyu and Brugere, Ivan and Dervovic, Danial and Polychroniadou, Antigoni and Wu, Min and Dachman-Soled, Dana}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {61979--62001}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhou24k/zhou24k.pdf}, url = {https://proceedings.mlr.press/v235/zhou24k.html}, abstract = {The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially- private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.} }
Endnote
%0 Conference Paper %T Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data %A Yvonne Zhou %A Mingyu Liang %A Ivan Brugere %A Danial Dervovic %A Antigoni Polychroniadou %A Min Wu %A Dana Dachman-Soled %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhou24k %I PMLR %P 61979--62001 %U https://proceedings.mlr.press/v235/zhou24k.html %V 235 %X The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially- private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.
APA
Zhou, Y., Liang, M., Brugere, I., Dervovic, D., Polychroniadou, A., Wu, M. & Dachman-Soled, D.. (2024). Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:61979-62001 Available from https://proceedings.mlr.press/v235/zhou24k.html.

Related Material