DBSCAN++: Towards fast and scalable density clustering
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:30193029, 2019.
Abstract
DBSCAN is a classical densitybased clustering procedure with tremendous practical relevance. However, DBSCAN implicitly needs to compute the empirical density for each sample point, leading to a quadratic worstcase time complexity, which is too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points. We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We also present statistical consistency guarantees showing the tradeoff between computational cost and estimation rates. Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a subquadratic algorithm that attains minimax optimal rates for levelset estimation, a quality that may be of independent interest.
Related Material


