SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors

Saumya Gaurang Shah, Abishek Sankararaman, Balakrishnan Murali Narayanaswamy, Vikramank Singh
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:54167-54185, 2025.

Abstract

Can we efficiently choose the best Anomaly Detection (AD) algorithm for a data-stream without requiring anomaly labels? Streaming anomaly detection is hard. SOTA AD algorithms are sensitive to their hyperparameters and no single method works well on all datasets. The best algorithm/hyper-parameter combination for a given data-stream can change over time with data drift. ’What is an anomaly?’ is often application, context and dataset dependent. We propose SEAD (Streaming Ensemble of Anomaly Detectors), the first model selection algorithm for streaming, unsupervised AD. All prior AD model selection algorithms are either supervised, or only work in the offline setting when all data from the test set is available upfront. We show that SEAD is (i) unsupervised, i.e., requires no true anomaly labels, (ii) efficiently implementable in a streaming setting, (iii) agnostic to the choice of the base algorithms among which it chooses from, and (iv) adaptive to non-stationarity in the data-stream. Experiments on 14 non-trivial public datasets and an internal dataset corroborate our claims.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-shah25c, title = {{SEAD}: Unsupervised Ensemble of Streaming Anomaly Detectors}, author = {Shah, Saumya Gaurang and Sankararaman, Abishek and Narayanaswamy, Balakrishnan Murali and Singh, Vikramank}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {54167--54185}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shah25c/shah25c.pdf}, url = {https://proceedings.mlr.press/v267/shah25c.html}, abstract = {Can we efficiently choose the best Anomaly Detection (AD) algorithm for a data-stream without requiring anomaly labels? Streaming anomaly detection is hard. SOTA AD algorithms are sensitive to their hyperparameters and no single method works well on all datasets. The best algorithm/hyper-parameter combination for a given data-stream can change over time with data drift. ’What is an anomaly?’ is often application, context and dataset dependent. We propose SEAD (Streaming Ensemble of Anomaly Detectors), the first model selection algorithm for streaming, unsupervised AD. All prior AD model selection algorithms are either supervised, or only work in the offline setting when all data from the test set is available upfront. We show that SEAD is (i) unsupervised, i.e., requires no true anomaly labels, (ii) efficiently implementable in a streaming setting, (iii) agnostic to the choice of the base algorithms among which it chooses from, and (iv) adaptive to non-stationarity in the data-stream. Experiments on 14 non-trivial public datasets and an internal dataset corroborate our claims.} }
Endnote
%0 Conference Paper %T SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors %A Saumya Gaurang Shah %A Abishek Sankararaman %A Balakrishnan Murali Narayanaswamy %A Vikramank Singh %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-shah25c %I PMLR %P 54167--54185 %U https://proceedings.mlr.press/v267/shah25c.html %V 267 %X Can we efficiently choose the best Anomaly Detection (AD) algorithm for a data-stream without requiring anomaly labels? Streaming anomaly detection is hard. SOTA AD algorithms are sensitive to their hyperparameters and no single method works well on all datasets. The best algorithm/hyper-parameter combination for a given data-stream can change over time with data drift. ’What is an anomaly?’ is often application, context and dataset dependent. We propose SEAD (Streaming Ensemble of Anomaly Detectors), the first model selection algorithm for streaming, unsupervised AD. All prior AD model selection algorithms are either supervised, or only work in the offline setting when all data from the test set is available upfront. We show that SEAD is (i) unsupervised, i.e., requires no true anomaly labels, (ii) efficiently implementable in a streaming setting, (iii) agnostic to the choice of the base algorithms among which it chooses from, and (iv) adaptive to non-stationarity in the data-stream. Experiments on 14 non-trivial public datasets and an internal dataset corroborate our claims.
APA
Shah, S.G., Sankararaman, A., Narayanaswamy, B.M. & Singh, V.. (2025). SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:54167-54185 Available from https://proceedings.mlr.press/v267/shah25c.html.

Related Material