Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

Myrto Limnios, Nathan Noiry, Stephan Clémençon
Proceedings of the Third International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 154:63-75, 2021.

Abstract

The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are abnormal. Detecting the latter, defined as outliers, is one of the major challenges for machine learning applications (e.g. in fraud detection or in predictive maintenance). In this paper, we propose a methodology addressing the problem of outlier detection, by learning a data-driven scoring function defined on the feature space which reflects the degree of abnormality of the observations. This scoring function is learnt through a well-designed binary classification problem whose empirical criterion takes the form of a two-sample linear rank statistics on which theoretical results are available. We illustrate our methodology with preliminary encouraging numerical experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v154-limnios21a, title = {Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics}, author = {Limnios, Myrto and Noiry, Nathan and Cl\'emen\c{c}on, Stephan}, booktitle = {Proceedings of the Third International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {63--75}, year = {2021}, editor = {Moniz, Nuno and Branco, Paula and Torgo, Luis and Japkowicz, Nathalie and Woźniak, Michał and Wang, Shuo}, volume = {154}, series = {Proceedings of Machine Learning Research}, month = {17 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v154/limnios21a/limnios21a.pdf}, url = {https://proceedings.mlr.press/v154/limnios21a.html}, abstract = {The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are abnormal. Detecting the latter, defined as outliers, is one of the major challenges for machine learning applications (e.g. in fraud detection or in predictive maintenance). In this paper, we propose a methodology addressing the problem of outlier detection, by learning a data-driven scoring function defined on the feature space which reflects the degree of abnormality of the observations. This scoring function is learnt through a well-designed binary classification problem whose empirical criterion takes the form of a two-sample linear rank statistics on which theoretical results are available. We illustrate our methodology with preliminary encouraging numerical experiments.} }
Endnote
%0 Conference Paper %T Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics %A Myrto Limnios %A Nathan Noiry %A Stephan Clémençon %B Proceedings of the Third International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2021 %E Nuno Moniz %E Paula Branco %E Luis Torgo %E Nathalie Japkowicz %E Michał Woźniak %E Shuo Wang %F pmlr-v154-limnios21a %I PMLR %P 63--75 %U https://proceedings.mlr.press/v154/limnios21a.html %V 154 %X The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are abnormal. Detecting the latter, defined as outliers, is one of the major challenges for machine learning applications (e.g. in fraud detection or in predictive maintenance). In this paper, we propose a methodology addressing the problem of outlier detection, by learning a data-driven scoring function defined on the feature space which reflects the degree of abnormality of the observations. This scoring function is learnt through a well-designed binary classification problem whose empirical criterion takes the form of a two-sample linear rank statistics on which theoretical results are available. We illustrate our methodology with preliminary encouraging numerical experiments.
APA
Limnios, M., Noiry, N. & Clémençon, S.. (2021). Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics. Proceedings of the Third International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 154:63-75 Available from https://proceedings.mlr.press/v154/limnios21a.html.

Related Material