$h$-DBSCAN: A simple fast DBSCAN algorithm for big data
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:81-96, 2021.
DBSCAN is a classical clustering algorithm, which can identify different shapes and isolate noisy patterns from a dataset. Despite the above advantages, the bottleneck of DBSCAN is its computation time for high dimensional datasets. This work, thus, presents a simple and fast method to improve the efficiency of DBSCAN algorithm. We reduce the execution time in two aspects. The first one is to reduce the number of points presented to DBSCAN and the second one is to apply the HNSW technique instead of the linear search structure for improving its efficiency. The experimental results show that our proposed algorithm can greatly improve the clustering speed without losing or even obtaining better accuracy, especially for large-scale datasets.