Fair and Diverse DPP-Based Data Summarization

Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, Nisheeth Vishnoi
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:716-725, 2018.

Abstract

Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias {–} e.g., under or over representation of a particular gender or ethnicity {–} in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Designing efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier; we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-celis18a, title = {Fair and Diverse {DPP}-Based Data Summarization}, author = {Celis, Elisa and Keswani, Vijay and Straszak, Damian and Deshpande, Amit and Kathuria, Tarun and Vishnoi, Nisheeth}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {716--725}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/celis18a/celis18a.pdf}, url = {http://proceedings.mlr.press/v80/celis18a.html}, abstract = {Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias {–} e.g., under or over representation of a particular gender or ethnicity {–} in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Designing efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier; we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case.} }
Endnote
%0 Conference Paper %T Fair and Diverse DPP-Based Data Summarization %A Elisa Celis %A Vijay Keswani %A Damian Straszak %A Amit Deshpande %A Tarun Kathuria %A Nisheeth Vishnoi %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-celis18a %I PMLR %P 716--725 %U http://proceedings.mlr.press/v80/celis18a.html %V 80 %X Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias {–} e.g., under or over representation of a particular gender or ethnicity {–} in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Designing efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier; we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case.
APA
Celis, E., Keswani, V., Straszak, D., Deshpande, A., Kathuria, T. & Vishnoi, N.. (2018). Fair and Diverse DPP-Based Data Summarization. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:716-725 Available from http://proceedings.mlr.press/v80/celis18a.html.

Related Material