Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs

Siddharth Gollapudi, Ravishankar Krishnaswamy, Kirankumar Shiragur, Harsh Wardhan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19787-19808, 2025.

Abstract

Graph-based data structures have become powerful and ubiquitous tools for scalable approximate nearest-neighbor (ANN) search over the past decade. In spite of their apparent practical performance, there has only recently been progress on the worst-case performance of these data structures. Indeed, the influential work of Indyx and Xu (2023) introduced the key concept of $\alpha$-reachable graphs, showing that graphs constructed by the DiskANN algorithm (Subramanya, et. al. 2023) produce an $\left(\frac{\alpha+1}{\alpha-1}\right)$-approximate solution with a simple best-first search that runs in poly-logarithmic query time. In our work, we improve and generalize this analysis as follows: - We introduce sorted $\alpha$-reachable graphs, and use this notion to obtain a stronger approximation factor of $\frac{\alpha}{\alpha-1}$ for the DiskANN algorithm on Euclidean metrics. - We present the first worst-case theoretical analysis for the popular beam-search algorithm, which is used in practice to search these graphs for $k > 1$ candidate nearest neighbors. We also present empirical results validating the significance of sorted $\alpha$-reachable graphs, which aligns with our theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gollapudi25a, title = {Sort Before You Prune: Improved Worst-Case Guarantees of the {D}isk{ANN} Family of Graphs}, author = {Gollapudi, Siddharth and Krishnaswamy, Ravishankar and Shiragur, Kirankumar and Wardhan, Harsh}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {19787--19808}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gollapudi25a/gollapudi25a.pdf}, url = {https://proceedings.mlr.press/v267/gollapudi25a.html}, abstract = {Graph-based data structures have become powerful and ubiquitous tools for scalable approximate nearest-neighbor (ANN) search over the past decade. In spite of their apparent practical performance, there has only recently been progress on the worst-case performance of these data structures. Indeed, the influential work of Indyx and Xu (2023) introduced the key concept of $\alpha$-reachable graphs, showing that graphs constructed by the DiskANN algorithm (Subramanya, et. al. 2023) produce an $\left(\frac{\alpha+1}{\alpha-1}\right)$-approximate solution with a simple best-first search that runs in poly-logarithmic query time. In our work, we improve and generalize this analysis as follows: - We introduce sorted $\alpha$-reachable graphs, and use this notion to obtain a stronger approximation factor of $\frac{\alpha}{\alpha-1}$ for the DiskANN algorithm on Euclidean metrics. - We present the first worst-case theoretical analysis for the popular beam-search algorithm, which is used in practice to search these graphs for $k > 1$ candidate nearest neighbors. We also present empirical results validating the significance of sorted $\alpha$-reachable graphs, which aligns with our theoretical findings.} }
Endnote
%0 Conference Paper %T Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs %A Siddharth Gollapudi %A Ravishankar Krishnaswamy %A Kirankumar Shiragur %A Harsh Wardhan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gollapudi25a %I PMLR %P 19787--19808 %U https://proceedings.mlr.press/v267/gollapudi25a.html %V 267 %X Graph-based data structures have become powerful and ubiquitous tools for scalable approximate nearest-neighbor (ANN) search over the past decade. In spite of their apparent practical performance, there has only recently been progress on the worst-case performance of these data structures. Indeed, the influential work of Indyx and Xu (2023) introduced the key concept of $\alpha$-reachable graphs, showing that graphs constructed by the DiskANN algorithm (Subramanya, et. al. 2023) produce an $\left(\frac{\alpha+1}{\alpha-1}\right)$-approximate solution with a simple best-first search that runs in poly-logarithmic query time. In our work, we improve and generalize this analysis as follows: - We introduce sorted $\alpha$-reachable graphs, and use this notion to obtain a stronger approximation factor of $\frac{\alpha}{\alpha-1}$ for the DiskANN algorithm on Euclidean metrics. - We present the first worst-case theoretical analysis for the popular beam-search algorithm, which is used in practice to search these graphs for $k > 1$ candidate nearest neighbors. We also present empirical results validating the significance of sorted $\alpha$-reachable graphs, which aligns with our theoretical findings.
APA
Gollapudi, S., Krishnaswamy, R., Shiragur, K. & Wardhan, H.. (2025). Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:19787-19808 Available from https://proceedings.mlr.press/v267/gollapudi25a.html.

Related Material