Functional Isolation Forest

Guillaume Staerman, Pavlo Mozharovskyi, Stephan Clémençon, Florence d’Alché-Buc
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:332-347, 2019.

Abstract

For the purpose of monitoring the behavior of complex infrastructures (\textit{e.g.} aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular {\scshape Isolation Forest} (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a \textit{Functional Statistical Depth} function is defined and discussed, as well as a multivariate functional extension. Numerical experiments provide strong empirical evidence of the accuracy of the extension proposed.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-staerman19a, title = {Functional Isolation Forest}, author = {Staerman, Guillaume and Mozharovskyi, Pavlo and Cl\'emen\c{c}on, Stephan and d'Alch\'e-Buc, Florence}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {332--347}, year = {2019}, editor = {Lee, Wee Sun and Suzuki, Taiji}, volume = {101}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/staerman19a/staerman19a.pdf}, url = {https://proceedings.mlr.press/v101/staerman19a.html}, abstract = {For the purpose of monitoring the behavior of complex infrastructures (\textit{e.g.} aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular {\scshape Isolation Forest} (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a \textit{Functional Statistical Depth} function is defined and discussed, as well as a multivariate functional extension. Numerical experiments provide strong empirical evidence of the accuracy of the extension proposed.} }
Endnote
%0 Conference Paper %T Functional Isolation Forest %A Guillaume Staerman %A Pavlo Mozharovskyi %A Stephan Clémençon %A Florence d’Alché-Buc %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-staerman19a %I PMLR %P 332--347 %U https://proceedings.mlr.press/v101/staerman19a.html %V 101 %X For the purpose of monitoring the behavior of complex infrastructures (\textit{e.g.} aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular {\scshape Isolation Forest} (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a \textit{Functional Statistical Depth} function is defined and discussed, as well as a multivariate functional extension. Numerical experiments provide strong empirical evidence of the accuracy of the extension proposed.
APA
Staerman, G., Mozharovskyi, P., Clémençon, S. & d’Alché-Buc, F.. (2019). Functional Isolation Forest. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:332-347 Available from https://proceedings.mlr.press/v101/staerman19a.html.

Related Material