Differential Privacy Under Class Imbalance: Methods and Empirical Insights

Lucas Rosenblatt; Yuliia Lut; Ethan Turok; Marco Avella Medina; Rachel Cummings

Differential Privacy Under Class Imbalance: Methods and Empirical Insights

Lucas Rosenblatt, Yuliia Lut, Ethan Turok, Marco Avella Medina, Rachel Cummings

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:52065-52109, 2025.

Abstract

Imbalanced learning occurs in classification settings where the distribution of class-labels is highly skewed in the training data, such as when predicting rare diseases or in fraud detection. This class imbalance presents a significant algorithmic challenge, which can be further exacerbated when privacy-preserving techniques such as differential privacy are applied to protect sensitive training data. Our work formalizes these challenges and provides a number of algorithmic solutions. We consider DP variants of pre-processing methods that privately augment the original dataset to reduce the class imbalance, alongside DP variants of in-processing techniques, which adjust the learning algorithm to account for the imbalance. For each method, we either adapt an existing imbalanced learning technique to the private setting or demonstrate its incompatibility with differential privacy. Finally, we empirically evaluate these privacy-preserving imbalanced learning methods under various data and distributional settings. We find that private synthetic data methods perform well as a data pre-processing step, while class-weighted ERMs are an alternative in higher-dimensional settings where private synthetic data suffers from the curse of dimensionality.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-rosenblatt25b,
  title = 	 {Differential Privacy Under Class Imbalance: Methods and Empirical Insights},
  author =       {Rosenblatt, Lucas and Lut, Yuliia and Turok, Ethan and Medina, Marco Avella and Cummings, Rachel},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {52065--52109},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/rosenblatt25b/rosenblatt25b.pdf},
  url = 	 {https://proceedings.mlr.press/v267/rosenblatt25b.html},
  abstract = 	 {Imbalanced learning occurs in classification settings where the distribution of class-labels is highly skewed in the training data, such as when predicting rare diseases or in fraud detection. This class imbalance presents a significant algorithmic challenge, which can be further exacerbated when privacy-preserving techniques such as differential privacy are applied to protect sensitive training data. Our work formalizes these challenges and provides a number of algorithmic solutions. We consider DP variants of pre-processing methods that privately augment the original dataset to reduce the class imbalance, alongside DP variants of in-processing techniques, which adjust the learning algorithm to account for the imbalance. For each method, we either adapt an existing imbalanced learning technique to the private setting or demonstrate its incompatibility with differential privacy. Finally, we empirically evaluate these privacy-preserving imbalanced learning methods under various data and distributional settings. We find that private synthetic data methods perform well as a data pre-processing step, while class-weighted ERMs are an alternative in higher-dimensional settings where private synthetic data suffers from the curse of dimensionality.}
}

Endnote

%0 Conference Paper
%T Differential Privacy Under Class Imbalance: Methods and Empirical Insights
%A Lucas Rosenblatt
%A Yuliia Lut
%A Ethan Turok
%A Marco Avella Medina
%A Rachel Cummings
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-rosenblatt25b
%I PMLR
%P 52065--52109
%U https://proceedings.mlr.press/v267/rosenblatt25b.html
%V 267
%X Imbalanced learning occurs in classification settings where the distribution of class-labels is highly skewed in the training data, such as when predicting rare diseases or in fraud detection. This class imbalance presents a significant algorithmic challenge, which can be further exacerbated when privacy-preserving techniques such as differential privacy are applied to protect sensitive training data. Our work formalizes these challenges and provides a number of algorithmic solutions. We consider DP variants of pre-processing methods that privately augment the original dataset to reduce the class imbalance, alongside DP variants of in-processing techniques, which adjust the learning algorithm to account for the imbalance. For each method, we either adapt an existing imbalanced learning technique to the private setting or demonstrate its incompatibility with differential privacy. Finally, we empirically evaluate these privacy-preserving imbalanced learning methods under various data and distributional settings. We find that private synthetic data methods perform well as a data pre-processing step, while class-weighted ERMs are an alternative in higher-dimensional settings where private synthetic data suffers from the curse of dimensionality.

APA

Rosenblatt, L., Lut, Y., Turok, E., Medina, M.A. & Cummings, R.. (2025). Differential Privacy Under Class Imbalance: Methods and Empirical Insights. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:52065-52109 Available from https://proceedings.mlr.press/v267/rosenblatt25b.html.

Differential Privacy Under Class Imbalance: Methods and Empirical Insights

Abstract

Cite this Paper

Related Material