Ensemble Learned Bloom Filters: Two Oracles are Better than One

Ming Lin, Lin Chen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:37762-37779, 2025.

Abstract

Bloom filters (BF) are space-efficient probabilistic data structures for approximate membership testing. Boosted by the proliferation of machine learning, learned Bloom filters (LBF) were recently proposed by augmenting the canonical BFs with a learned oracle as a pre-filter, the size of which is crucial to the compactness of the overall system. In this paper, inspired by ensemble learning, we depart from the state-of-the-art single-oracle LBF structure by demonstrating that, by leveraging multiple learning oracles of smaller size and carefully optimizing the accompanied backup filters, we can significantly boost the performance of LBF under the same space budget. We then design and optimize ensemble learned Bloom filters for mutually independent and correlated learning oracles respectively. We also empirically demonstrate the performance improvement of our propositions under three practical data analysis tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lin25c, title = {Ensemble Learned Bloom Filters: Two Oracles are Better than One}, author = {Lin, Ming and Chen, Lin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {37762--37779}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lin25c/lin25c.pdf}, url = {https://proceedings.mlr.press/v267/lin25c.html}, abstract = {Bloom filters (BF) are space-efficient probabilistic data structures for approximate membership testing. Boosted by the proliferation of machine learning, learned Bloom filters (LBF) were recently proposed by augmenting the canonical BFs with a learned oracle as a pre-filter, the size of which is crucial to the compactness of the overall system. In this paper, inspired by ensemble learning, we depart from the state-of-the-art single-oracle LBF structure by demonstrating that, by leveraging multiple learning oracles of smaller size and carefully optimizing the accompanied backup filters, we can significantly boost the performance of LBF under the same space budget. We then design and optimize ensemble learned Bloom filters for mutually independent and correlated learning oracles respectively. We also empirically demonstrate the performance improvement of our propositions under three practical data analysis tasks.} }
Endnote
%0 Conference Paper %T Ensemble Learned Bloom Filters: Two Oracles are Better than One %A Ming Lin %A Lin Chen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lin25c %I PMLR %P 37762--37779 %U https://proceedings.mlr.press/v267/lin25c.html %V 267 %X Bloom filters (BF) are space-efficient probabilistic data structures for approximate membership testing. Boosted by the proliferation of machine learning, learned Bloom filters (LBF) were recently proposed by augmenting the canonical BFs with a learned oracle as a pre-filter, the size of which is crucial to the compactness of the overall system. In this paper, inspired by ensemble learning, we depart from the state-of-the-art single-oracle LBF structure by demonstrating that, by leveraging multiple learning oracles of smaller size and carefully optimizing the accompanied backup filters, we can significantly boost the performance of LBF under the same space budget. We then design and optimize ensemble learned Bloom filters for mutually independent and correlated learning oracles respectively. We also empirically demonstrate the performance improvement of our propositions under three practical data analysis tasks.
APA
Lin, M. & Chen, L.. (2025). Ensemble Learned Bloom Filters: Two Oracles are Better than One. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:37762-37779 Available from https://proceedings.mlr.press/v267/lin25c.html.

Related Material