Correlation Clustering in Constant Many Parallel Rounds

Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2069-2078, 2021.

Abstract

Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability evaluation of our techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-cohen-addad21b, title = {Correlation Clustering in Constant Many Parallel Rounds}, author = {Cohen-Addad, Vincent and Lattanzi, Silvio and Mitrovi{\'c}, Slobodan and Norouzi-Fard, Ashkan and Parotsidis, Nikos and Tarnawski, Jakub}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2069--2078}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/cohen-addad21b/cohen-addad21b.pdf}, url = {https://proceedings.mlr.press/v139/cohen-addad21b.html}, abstract = {Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability evaluation of our techniques.} }
Endnote
%0 Conference Paper %T Correlation Clustering in Constant Many Parallel Rounds %A Vincent Cohen-Addad %A Silvio Lattanzi %A Slobodan Mitrović %A Ashkan Norouzi-Fard %A Nikos Parotsidis %A Jakub Tarnawski %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-cohen-addad21b %I PMLR %P 2069--2078 %U https://proceedings.mlr.press/v139/cohen-addad21b.html %V 139 %X Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability evaluation of our techniques.
APA
Cohen-Addad, V., Lattanzi, S., Mitrović, S., Norouzi-Fard, A., Parotsidis, N. & Tarnawski, J.. (2021). Correlation Clustering in Constant Many Parallel Rounds. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2069-2078 Available from https://proceedings.mlr.press/v139/cohen-addad21b.html.

Related Material