Why Are Learned Indexes So Effective?

Paolo Ferragina, Fabrizio Lillo, Giorgio Vinciguerra
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3123-3132, 2020.

Abstract

A recent trend in algorithm design consists of augmenting classic data structures with machine learning models, which are better suited to reveal and exploit patterns and trends in the input data so to achieve outstanding practical improvements in space occupancy and time efficiency. This is especially known in the context of indexing data structures where, despite few attempts in evaluating their asymptotic efficiency, theoretical results are yet missing in showing that learned indexes are provably better than classic indexes, such as B+ trees and their variants. In this paper, we present the first mathematically-grounded answer to this open problem. We obtain this result by discovering and exploiting a link between the original problem and a mean exit time problem over a proper stochastic process which, we show, is related to the space and time occupancy of those learned indexes. Our general result is then specialised to five well-known distributions: Uniform, Lognormal, Pareto, Exponential, and Gamma; and it is corroborated in precision and robustness by a large set of experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-ferragina20a, title = {Why Are Learned Indexes So Effective?}, author = {Ferragina, Paolo and Lillo, Fabrizio and Vinciguerra, Giorgio}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3123--3132}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/ferragina20a/ferragina20a.pdf}, url = {https://proceedings.mlr.press/v119/ferragina20a.html}, abstract = {A recent trend in algorithm design consists of augmenting classic data structures with machine learning models, which are better suited to reveal and exploit patterns and trends in the input data so to achieve outstanding practical improvements in space occupancy and time efficiency. This is especially known in the context of indexing data structures where, despite few attempts in evaluating their asymptotic efficiency, theoretical results are yet missing in showing that learned indexes are provably better than classic indexes, such as B+ trees and their variants. In this paper, we present the first mathematically-grounded answer to this open problem. We obtain this result by discovering and exploiting a link between the original problem and a mean exit time problem over a proper stochastic process which, we show, is related to the space and time occupancy of those learned indexes. Our general result is then specialised to five well-known distributions: Uniform, Lognormal, Pareto, Exponential, and Gamma; and it is corroborated in precision and robustness by a large set of experiments.} }
Endnote
%0 Conference Paper %T Why Are Learned Indexes So Effective? %A Paolo Ferragina %A Fabrizio Lillo %A Giorgio Vinciguerra %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-ferragina20a %I PMLR %P 3123--3132 %U https://proceedings.mlr.press/v119/ferragina20a.html %V 119 %X A recent trend in algorithm design consists of augmenting classic data structures with machine learning models, which are better suited to reveal and exploit patterns and trends in the input data so to achieve outstanding practical improvements in space occupancy and time efficiency. This is especially known in the context of indexing data structures where, despite few attempts in evaluating their asymptotic efficiency, theoretical results are yet missing in showing that learned indexes are provably better than classic indexes, such as B+ trees and their variants. In this paper, we present the first mathematically-grounded answer to this open problem. We obtain this result by discovering and exploiting a link between the original problem and a mean exit time problem over a proper stochastic process which, we show, is related to the space and time occupancy of those learned indexes. Our general result is then specialised to five well-known distributions: Uniform, Lognormal, Pareto, Exponential, and Gamma; and it is corroborated in precision and robustness by a large set of experiments.
APA
Ferragina, P., Lillo, F. & Vinciguerra, G.. (2020). Why Are Learned Indexes So Effective?. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3123-3132 Available from https://proceedings.mlr.press/v119/ferragina20a.html.

Related Material