Fraud Detection with Density Estimation Trees

Parikshit Ram, Alexander G. Gray
Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance, PMLR 71:85-94, 2018.

Abstract

We consider the problem of anomaly detection in finance. An application of interest is the detection of first-time fraud where new classes of fraud need to be detected using unsupervised learning to augment the existing supervised learning techniques that capture known classes of frauds. This domain usually has the following requirements - (i) the ability to handle data containing both numerical and categorical features, (ii) very low latency real-time detection, and (iii) interpretability. We propose the use of a variant of density estimation trees (DETs) (Ram and Gray, 2011) for anomaly detection using distributional properties of the data. We formally present a procedure for handling data sets with both categorical and numerical features while Ram and Gray (2011) focused mainly on data sets with all numerical features. DETs have demonstrably fast prediction times, orders of magnitude faster than other density estimators like kernel density estimators. The estimation of the density and the anomalousness score for any new item can be done very efficiently. Beyond the flexibility and effciency, DETs are also quite interpretable. For the task of anomaly detection, DETs can generate a set of decision rules that lead to high anomalousness scores. We empirically demonstrate these capabilities on a publicly available fraud data set.

Cite this Paper


BibTeX
@InProceedings{pmlr-v71-ram18a, title = {Fraud Detection with Density Estimation Trees}, author = {Ram, Parikshit and Gray, Alexander G.}, booktitle = {Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance}, pages = {85--94}, year = {2018}, editor = {Anandakrishnan, Archana and Kumar, Senthil and Statnikov, Alexander and Faruquie, Tanveer and Xu, Di}, volume = {71}, series = {Proceedings of Machine Learning Research}, month = {14 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v71/ram18a/ram18a.pdf}, url = {https://proceedings.mlr.press/v71/ram18a.html}, abstract = {We consider the problem of anomaly detection in finance. An application of interest is the detection of first-time fraud where new classes of fraud need to be detected using unsupervised learning to augment the existing supervised learning techniques that capture known classes of frauds. This domain usually has the following requirements - (i) the ability to handle data containing both numerical and categorical features, (ii) very low latency real-time detection, and (iii) interpretability. We propose the use of a variant of density estimation trees (DETs) (Ram and Gray, 2011) for anomaly detection using distributional properties of the data. We formally present a procedure for handling data sets with both categorical and numerical features while Ram and Gray (2011) focused mainly on data sets with all numerical features. DETs have demonstrably fast prediction times, orders of magnitude faster than other density estimators like kernel density estimators. The estimation of the density and the anomalousness score for any new item can be done very efficiently. Beyond the flexibility and effciency, DETs are also quite interpretable. For the task of anomaly detection, DETs can generate a set of decision rules that lead to high anomalousness scores. We empirically demonstrate these capabilities on a publicly available fraud data set.} }
Endnote
%0 Conference Paper %T Fraud Detection with Density Estimation Trees %A Parikshit Ram %A Alexander G. Gray %B Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance %C Proceedings of Machine Learning Research %D 2018 %E Archana Anandakrishnan %E Senthil Kumar %E Alexander Statnikov %E Tanveer Faruquie %E Di Xu %F pmlr-v71-ram18a %I PMLR %P 85--94 %U https://proceedings.mlr.press/v71/ram18a.html %V 71 %X We consider the problem of anomaly detection in finance. An application of interest is the detection of first-time fraud where new classes of fraud need to be detected using unsupervised learning to augment the existing supervised learning techniques that capture known classes of frauds. This domain usually has the following requirements - (i) the ability to handle data containing both numerical and categorical features, (ii) very low latency real-time detection, and (iii) interpretability. We propose the use of a variant of density estimation trees (DETs) (Ram and Gray, 2011) for anomaly detection using distributional properties of the data. We formally present a procedure for handling data sets with both categorical and numerical features while Ram and Gray (2011) focused mainly on data sets with all numerical features. DETs have demonstrably fast prediction times, orders of magnitude faster than other density estimators like kernel density estimators. The estimation of the density and the anomalousness score for any new item can be done very efficiently. Beyond the flexibility and effciency, DETs are also quite interpretable. For the task of anomaly detection, DETs can generate a set of decision rules that lead to high anomalousness scores. We empirically demonstrate these capabilities on a publicly available fraud data set.
APA
Ram, P. & Gray, A.G.. (2018). Fraud Detection with Density Estimation Trees. Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance, in Proceedings of Machine Learning Research 71:85-94 Available from https://proceedings.mlr.press/v71/ram18a.html.

Related Material