[edit]
Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19787-19808, 2025.
Abstract
Graph-based data structures have become powerful and ubiquitous tools for scalable approximate nearest-neighbor (ANN) search over the past decade. In spite of their apparent practical performance, there has only recently been progress on the worst-case performance of these data structures. Indeed, the influential work of Indyx and Xu (2023) introduced the key concept of $\alpha$-reachable graphs, showing that graphs constructed by the DiskANN algorithm (Subramanya, et. al. 2023) produce an $\left(\frac{\alpha+1}{\alpha-1}\right)$-approximate solution with a simple best-first search that runs in poly-logarithmic query time. In our work, we improve and generalize this analysis as follows: - We introduce sorted $\alpha$-reachable graphs, and use this notion to obtain a stronger approximation factor of $\frac{\alpha}{\alpha-1}$ for the DiskANN algorithm on Euclidean metrics. - We present the first worst-case theoretical analysis for the popular beam-search algorithm, which is used in practice to search these graphs for $k > 1$ candidate nearest neighbors. We also present empirical results validating the significance of sorted $\alpha$-reachable graphs, which aligns with our theoretical findings.