Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints

Ehsan Kazemi; Morteza Zadimoghaddam; Amin Karbasi

Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints

Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2544-2553, 2018.

Abstract

Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against any number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-kazemi18a,
  title = 	 {Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints},
  author =       {Kazemi, Ehsan and Zadimoghaddam, Morteza and Karbasi, Amin},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {2544--2553},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/kazemi18a/kazemi18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/kazemi18a.html},
  abstract = 	 {Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against any number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.}
}

Endnote

%0 Conference Paper
%T Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints
%A Ehsan Kazemi
%A Morteza Zadimoghaddam
%A Amin Karbasi
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-kazemi18a
%I PMLR
%P 2544--2553
%U https://proceedings.mlr.press/v80/kazemi18a.html
%V 80
%X Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against any number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.

APA


Kazemi, E., Zadimoghaddam, M. & Karbasi, A.. (2018). Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2544-2553 Available from https://proceedings.mlr.press/v80/kazemi18a.html.

Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints

Abstract

Cite this Paper

Related Material