Scoring anomalies: a M-estimation formulation

Stéphan Clémençon; Jérémie Jakubowicz

Scoring anomalies: a M-estimation formulation

Stéphan Clémençon, Jérémie Jakubowicz

Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:659-667, 2013.

Abstract

It is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormality/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estimation techniques, observations being viewed as all the more “abnormal” as they are located far in the tail(s) of the underlying probability distribution. In a wide variety of applications, it is desirable to dispose of a scalar valued “scoring” function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance criterion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estimation of this criterion is tackled and preliminary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical performance measure are established.

Cite this Paper

BibTeX


@InProceedings{pmlr-v31-clemencon13a,
  title = 	 {Scoring anomalies: a {M}-estimation formulation},
  author = 	 {Clémençon, Stéphan and Jakubowicz, Jérémie},
  booktitle = 	 {Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {659--667},
  year = 	 {2013},
  editor = 	 {Carvalho, Carlos M. and Ravikumar, Pradeep},
  volume = 	 {31},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Scottsdale, Arizona, USA},
  month = 	 {29 Apr--01 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v31/clemencon13a.pdf},
  url = 	 {https://proceedings.mlr.press/v31/clemencon13a.html},
  abstract = 	 {It is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormality/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estimation techniques, observations being viewed as all the more “abnormal” as they are located far in the tail(s) of the underlying probability distribution. In a wide variety of applications, it is desirable to dispose of a scalar valued “scoring” function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance criterion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estimation of this criterion is tackled and preliminary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical performance measure are established.}
}

Endnote

%0 Conference Paper
%T Scoring anomalies: a M-estimation formulation
%A Stéphan Clémençon
%A Jérémie Jakubowicz
%B Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2013
%E Carlos M. Carvalho
%E Pradeep Ravikumar	
%F pmlr-v31-clemencon13a
%I PMLR
%P 659--667
%U https://proceedings.mlr.press/v31/clemencon13a.html
%V 31
%X It is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormality/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estimation techniques, observations being viewed as all the more “abnormal” as they are located far in the tail(s) of the underlying probability distribution. In a wide variety of applications, it is desirable to dispose of a scalar valued “scoring” function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance criterion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estimation of this criterion is tackled and preliminary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical performance measure are established.

RIS


TY  - CPAPER
TI  - Scoring anomalies: a M-estimation formulation
AU  - Stéphan Clémençon
AU  - Jérémie Jakubowicz
BT  - Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics
DA  - 2013/04/29
ED  - Carlos M. Carvalho
ED  - Pradeep Ravikumar	
ID  - pmlr-v31-clemencon13a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 31
SP  - 659
EP  - 667
L1  - http://proceedings.mlr.press/v31/clemencon13a.pdf
UR  - https://proceedings.mlr.press/v31/clemencon13a.html
AB  - It is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormality/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estimation techniques, observations being viewed as all the more “abnormal” as they are located far in the tail(s) of the underlying probability distribution. In a wide variety of applications, it is desirable to dispose of a scalar valued “scoring” function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance criterion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estimation of this criterion is tackled and preliminary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical performance measure are established.
ER  -

APA


Clémençon, S. & Jakubowicz, J.. (2013). Scoring anomalies: a M-estimation formulation. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 31:659-667 Available from https://proceedings.mlr.press/v31/clemencon13a.html.

Related Material

Download PDF