Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs

Slobodan Mitrovic, Theodore Pan
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:35876-35891, 2024.

Abstract

Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, the aim is to find an induced subgraph whose edge-to-vertex ratio is maximized. We show how to find a $(2+\epsilon)$ approximation of the directed densest subgraph on randomized streams in a single pass while using $O(n \cdot {\rm poly} \log n)$ memory on $n$-vertex graphs. In contrast, the approach by Bahmani et al. (VLDB 2012) uses $O(\log n)$ passes and by Esfandiari et al. (2015) makes one pass but uses $O(n^{3/2})$ memory; both algorithms also apply to arbitrary-ordered streams. Our techniques extend to Massively Parallel Computation (MPC), yielding quadratic improvement over state-of-the-art by Bahmani et al. (VLDB 2012 and WAW 2014). We empirically show that the quality of our output is essentially the same as that of Bahmani et al. (VLDB 2012) while being $2$ times faster on large graphs, even on non-randomly ordered streams.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-mitrovic24a, title = {Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs}, author = {Mitrovic, Slobodan and Pan, Theodore}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {35876--35891}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/mitrovic24a/mitrovic24a.pdf}, url = {https://proceedings.mlr.press/v235/mitrovic24a.html}, abstract = {Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, the aim is to find an induced subgraph whose edge-to-vertex ratio is maximized. We show how to find a $(2+\epsilon)$ approximation of the directed densest subgraph on randomized streams in a single pass while using $O(n \cdot {\rm poly} \log n)$ memory on $n$-vertex graphs. In contrast, the approach by Bahmani et al. (VLDB 2012) uses $O(\log n)$ passes and by Esfandiari et al. (2015) makes one pass but uses $O(n^{3/2})$ memory; both algorithms also apply to arbitrary-ordered streams. Our techniques extend to Massively Parallel Computation (MPC), yielding quadratic improvement over state-of-the-art by Bahmani et al. (VLDB 2012 and WAW 2014). We empirically show that the quality of our output is essentially the same as that of Bahmani et al. (VLDB 2012) while being $2$ times faster on large graphs, even on non-randomly ordered streams.} }
Endnote
%0 Conference Paper %T Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs %A Slobodan Mitrovic %A Theodore Pan %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-mitrovic24a %I PMLR %P 35876--35891 %U https://proceedings.mlr.press/v235/mitrovic24a.html %V 235 %X Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, the aim is to find an induced subgraph whose edge-to-vertex ratio is maximized. We show how to find a $(2+\epsilon)$ approximation of the directed densest subgraph on randomized streams in a single pass while using $O(n \cdot {\rm poly} \log n)$ memory on $n$-vertex graphs. In contrast, the approach by Bahmani et al. (VLDB 2012) uses $O(\log n)$ passes and by Esfandiari et al. (2015) makes one pass but uses $O(n^{3/2})$ memory; both algorithms also apply to arbitrary-ordered streams. Our techniques extend to Massively Parallel Computation (MPC), yielding quadratic improvement over state-of-the-art by Bahmani et al. (VLDB 2012 and WAW 2014). We empirically show that the quality of our output is essentially the same as that of Bahmani et al. (VLDB 2012) while being $2$ times faster on large graphs, even on non-randomly ordered streams.
APA
Mitrovic, S. & Pan, T.. (2024). Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:35876-35891 Available from https://proceedings.mlr.press/v235/mitrovic24a.html.

Related Material