Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability

Sepanta Zeighami, Cyrus Shahabi
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:58283-58305, 2024.

Abstract

Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the distribution learnability framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zeighami24a, title = {Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability}, author = {Zeighami, Sepanta and Shahabi, Cyrus}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {58283--58305}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zeighami24a/zeighami24a.pdf}, url = {https://proceedings.mlr.press/v235/zeighami24a.html}, abstract = {Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the distribution learnability framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.} }
Endnote
%0 Conference Paper %T Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability %A Sepanta Zeighami %A Cyrus Shahabi %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zeighami24a %I PMLR %P 58283--58305 %U https://proceedings.mlr.press/v235/zeighami24a.html %V 235 %X Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the distribution learnability framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.
APA
Zeighami, S. & Shahabi, C.. (2024). Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:58283-58305 Available from https://proceedings.mlr.press/v235/zeighami24a.html.

Related Material