Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints

Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2544-2553, 2018.

Abstract

Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against any number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-kazemi18a, title = {Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints}, author = {Kazemi, Ehsan and Zadimoghaddam, Morteza and Karbasi, Amin}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2544--2553}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/kazemi18a/kazemi18a.pdf}, url = {http://proceedings.mlr.press/v80/kazemi18a.html}, abstract = {Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against any number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.} }
Endnote
%0 Conference Paper %T Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints %A Ehsan Kazemi %A Morteza Zadimoghaddam %A Amin Karbasi %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-kazemi18a %I PMLR %P 2544--2553 %U http://proceedings.mlr.press/v80/kazemi18a.html %V 80 %X Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against any number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.
APA
Kazemi, E., Zadimoghaddam, M. & Karbasi, A.. (2018). Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2544-2553 Available from http://proceedings.mlr.press/v80/kazemi18a.html.

Related Material