Dependent randomized rounding for clustering and partition systems with knapsack constraints

David Harris; Thomas Pensyl; Aravind Srinivasan; Khoa Trinh

Dependent randomized rounding for clustering and partition systems with knapsack constraints

David Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2273-2283, 2020.

Abstract

Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on \emph{fairness} in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with “knapsack" and “partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool we develop and use, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds are very useful in inferring properties of large networks using few samples.

Cite this Paper

BibTeX

@InProceedings{pmlr-v108-harris20a,
  title = 	 {Dependent randomized rounding for clustering and partition systems with knapsack constraints},
  author =       {Harris, David and Pensyl, Thomas and Srinivasan, Aravind and Trinh, Khoa},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2273--2283},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/harris20a/harris20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/harris20a.html},
  abstract = 	 {Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on \emph{fairness} in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with “knapsack" and “partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool we develop and use, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds are very useful in inferring properties of large networks using few samples.}
}

Endnote

%0 Conference Paper
%T Dependent randomized rounding for clustering and partition systems with knapsack constraints
%A David Harris
%A Thomas Pensyl
%A Aravind Srinivasan
%A Khoa Trinh
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-harris20a
%I PMLR
%P 2273--2283
%U https://proceedings.mlr.press/v108/harris20a.html
%V 108
%X Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on \emph{fairness} in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with “knapsack" and “partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool we develop and use, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds are very useful in inferring properties of large networks using few samples.

APA

Harris, D., Pensyl, T., Srinivasan, A. & Trinh, K.. (2020). Dependent randomized rounding for clustering and partition systems with knapsack constraints. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2273-2283 Available from https://proceedings.mlr.press/v108/harris20a.html.

Related Material

Download PDF