How to Solve Fair k-Center in Massive Data Models

Ashish Chiplunkar, Sagar Kale, Sivaramakrishnan Natarajan Ramamoorthy
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1877-1886, 2020.

Abstract

Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed algorithms for the fair k-center problem that models fair data summarization. The streaming and distributed models of computation have an attractive feature of being able to handle massive data sets that do not fit into main memory. Our main contributions are: (a) the first distributed algorithm; which has provably constant approximation ratio and is extremely parallelizable, and (b) a two-pass streaming algorithm with a provable approximation guarantee matching the best known algorithm (which is not a streaming algorithm). Our algorithms have the advantages of being easy to implement in practice, being fast with linear running times, having very small working memory and communication, and outperforming existing algorithms on several real and synthetic data sets. To complement our distributed algorithm, we also give a hardness result for natural distributed algorithms, which holds for even the special case of k-center.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-chiplunkar20a, title = {How to Solve Fair k-Center in Massive Data Models}, author = {Chiplunkar, Ashish and Kale, Sagar and Ramamoorthy, Sivaramakrishnan Natarajan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {1877--1886}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/chiplunkar20a/chiplunkar20a.pdf}, url = { http://proceedings.mlr.press/v119/chiplunkar20a.html }, abstract = {Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed algorithms for the fair k-center problem that models fair data summarization. The streaming and distributed models of computation have an attractive feature of being able to handle massive data sets that do not fit into main memory. Our main contributions are: (a) the first distributed algorithm; which has provably constant approximation ratio and is extremely parallelizable, and (b) a two-pass streaming algorithm with a provable approximation guarantee matching the best known algorithm (which is not a streaming algorithm). Our algorithms have the advantages of being easy to implement in practice, being fast with linear running times, having very small working memory and communication, and outperforming existing algorithms on several real and synthetic data sets. To complement our distributed algorithm, we also give a hardness result for natural distributed algorithms, which holds for even the special case of k-center.} }
Endnote
%0 Conference Paper %T How to Solve Fair k-Center in Massive Data Models %A Ashish Chiplunkar %A Sagar Kale %A Sivaramakrishnan Natarajan Ramamoorthy %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-chiplunkar20a %I PMLR %P 1877--1886 %U http://proceedings.mlr.press/v119/chiplunkar20a.html %V 119 %X Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed algorithms for the fair k-center problem that models fair data summarization. The streaming and distributed models of computation have an attractive feature of being able to handle massive data sets that do not fit into main memory. Our main contributions are: (a) the first distributed algorithm; which has provably constant approximation ratio and is extremely parallelizable, and (b) a two-pass streaming algorithm with a provable approximation guarantee matching the best known algorithm (which is not a streaming algorithm). Our algorithms have the advantages of being easy to implement in practice, being fast with linear running times, having very small working memory and communication, and outperforming existing algorithms on several real and synthetic data sets. To complement our distributed algorithm, we also give a hardness result for natural distributed algorithms, which holds for even the special case of k-center.
APA
Chiplunkar, A., Kale, S. & Ramamoorthy, S.N.. (2020). How to Solve Fair k-Center in Massive Data Models. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:1877-1886 Available from http://proceedings.mlr.press/v119/chiplunkar20a.html .

Related Material