- title: 'Anomaly Detection in Finance: Editors’ Introduction'
  volume: 71
  URL: https://proceedings.mlr.press/v71/anandakrishnan18a.html
  PDF: http://proceedings.mlr.press/v71/anandakrishnan18a/anandakrishnan18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-anandakrishnan18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 1-7
  id: anandakrishnan18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 1
  lastpage: 7
  published: 2018-01-07 00:00:00 +0000
- title: 'Uncovering Unknown Unknowns in Financial Services Big Data by Unsupervised Methodologies: Present and Future trends'
  abstract: 'Currently, unknown unknowns in high dimensional big data environments can go unnoticed for a long period of time. The failure to detect anomalies in critical infrastructure data can result in extensive financial, operational, reputational and life threatening consequences. In this paper, we describe algorithms for an automatic and unsupervised anomaly detection that do not necessitate domain expertise, signatures, rules, patterns or semantics understanding of the features. We propose several new methodologies for anomaly detection to protect critical infrastructures, with emphasis on finance, spanning from theory to actionable technology. Although anomalies can originate from several sources, we also show that cyber threat, financial and operational malfunction are converging into a single detection paradigm. Performance comparison between different algorithms (ours and others) is presented as well as examples from real use cases.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/shabat18a.html
  PDF: http://proceedings.mlr.press/v71/shabat18a/shabat18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-shabat18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Gil
    family: Shabat
  - given: David
    family: Segev
  - given: Amir
    family: Averbuch
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 8-19
  id: shabat18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 8
  lastpage: 19
  published: 2018-01-07 00:00:00 +0000
- title: 'Analytical Techniques for Anomaly Detection Through Features, Signal-Noise Separation and Partial-Value Association'
  abstract: 'This paper presents three analytical techniques for anomaly detection which can play an important role for anomaly detection in finance: the feature extraction technique, the signal-noise separation technique, and the Partial-Value Association Discovery (PVAD) algorithm. The feature extraction technique emphasizes the importance of extracting various data features which may be better at separating anomalies from norms than using raw data. The signal-noise separation technique considers an anomaly as the signal to detect and the norm as the noise and employs both anomaly models and norm models to detect anomalies accurately. The PVAD algorithm enables learning from data to build anomaly patterns and norm patterns which capture both partial-value and full-value variable relations as well as interactive, concurrent effects of multiple variables.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/ye18a.html
  PDF: http://proceedings.mlr.press/v71/ye18a/ye18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-ye18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Nong
    family: Ye
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 20-32
  id: ye18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 20
  lastpage: 32
  published: 2018-01-07 00:00:00 +0000
- title: 'Spotlighting Anomalies using Frequent Patterns'
  abstract: 'Approaches for anomaly detection based on frequent pattern mining follow the paradigm: if an instance contains more frequent patterns, it means that this data instance is unlikely to be an anomaly. This concept can be used in financial industry to reveal contextual anomalies. The main contribution of this paper is an approach that includes a novel formula for computation of anomaly scores. We evaluated the proposed approach on baseline datasets and present a use case on a real world financial dataset. We also propose a way how to explain the anomaly to the users. Implementations of the evaluated algorithms and experiments are available online in R.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/kuchar18a.html
  PDF: http://proceedings.mlr.press/v71/kuchar18a/kuchar18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-kuchar18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Jaroslav
    family: Kuchar
  - given: Vojtech
    family: Svatek
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 33-42
  id: kuchar18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 33
  lastpage: 42
  published: 2018-01-07 00:00:00 +0000
- title: 'Ensemble-Based Anomaly Detetction using Cooperative Learning'
  abstract: 'Using the same process and functionality to solve both clustering and outlier discovery is highly desired. Such integration will be of great benefit to discover outliers in data and consequently obtain better clustering results after eliminating the set of outliers. It is known that the capability of discovering outliers using clustering-based techniques is mainly based on the quality of the adopted clustering. In this paper, a novel Cooperative Clustering Outlier Detection (CCOD) algorithm is presented. It involves multiple clustering techniques; the goal of the cooperative approach is to discover those outliers that are not detected by the single clustering-based outlier detection approaches using the methodology of cooperation. Undertaken experimental results show that the detection accuracy of the cooperative technique is better than that of the typical clustering-based FindCBLOF method over a number of artificial, gene expression and text document datasets.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/kashef18a.html
  PDF: http://proceedings.mlr.press/v71/kashef18a/kashef18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-kashef18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Rasha F.
    family: Kashef
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 43-55
  id: kashef18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 43
  lastpage: 55
  published: 2018-01-07 00:00:00 +0000
- title: 'Real-time anomaly detection system for time series at scale'
  abstract: 'This paper describes the design considerations and general outline of an anomaly detection system used by Anodot. We present results of the system on a large set of metrics collected from multiple companies.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/toledano18a.html
  PDF: http://proceedings.mlr.press/v71/toledano18a/toledano18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-toledano18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Meir
    family: Toledano
  - given: Ira
    family: Cohen
  - given: Yonatan
    family: Ben-Simhon
  - given: Inbal
    family: Tadeski
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 56-65
  id: toledano18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 56
  lastpage: 65
  published: 2018-01-07 00:00:00 +0000
- title: 'Collective Fraud Detection Capturing Inter-Transaction Dependency'
  abstract: 'In e-commerce, different payment transactions have different levels of risk. Risk is generally higher for digital goods, but it also differs based on product and its popularity, the offer type (packaged game, virtual currency to a game or subscription service), storefront and geography. Existing fraud policies and models make decisions independently for each transaction based on transaction attributes, payment velocities, user characteristics, and other relevant information. However, suspicious transactions may still evade detection and hence we propose a novel approach leveraging a graph based perspective to uncover relationships among suspicious transactions, i.e., inter-transaction dependency. Our focus is to detect suspicious transactions by capturing common fraudulent behaviors that would not be considered suspicious when being considered in isolation. In this paper, we present HitFraud that leverages heterogeneous information networks for collective fraud detection by exploring correlated and fast evolving fraudulent behaviors. First, a heterogeneous information network is designed to link entities of interest in the transaction database via different semantics. Then, graph based features are efficiently discovered from the network exploiting the concept of meta-paths, and decisions on frauds are made collectively on test instances. Experiments on real-world payment transaction data from Electronic Arts demonstrate that the prediction performance is effectively boosted by HitFraud where the computation of meta-path based features is largely optimized. Notably, recall can be improved up to 7.93% and F-score 4.62% compared to baselines.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/cao18a.html
  PDF: http://proceedings.mlr.press/v71/cao18a/cao18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-cao18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Bokai
    family: Cao
  - given: Mia
    family: Mao
  - given: Siim
    family: Viidu
  - given: Philip
    family: Yu
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 66-75
  id: cao18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 66
  lastpage: 75
  published: 2018-01-07 00:00:00 +0000
- title: 'PD-FDS: Purchase Density based Online Credit Card Fraud Detection System'
  abstract: 'Credit card fraud detection is an endless war between fraudsters and payment service providers. Indeed, annual global financial loss by credit card frauds has increased. Fraudsters have been organized and systematized, attempting to find weak points of existing fraud detection system (FDS). State-of-the-art FDS approaches utilize already existing fraud cases, which can result in different FDS by payment service providers. Therefore, a new payment service provider may not have room for installing a FDS due to the lack of fraudulent cases. Moreover, credit card transactions contain the legitimate owner’s personal information, which can be exposed to an honest but curious fraud analyst. In this paper, we propose a purchase density based FDS (PD-FDS) that uses three features which are not related to personal information. PD-FDS does not require already existing fraudulent transactions and also shows low false positive rate (<0.01).'
  volume: 71
  URL: https://proceedings.mlr.press/v71/ki18a.html
  PDF: http://proceedings.mlr.press/v71/ki18a/ki18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-ki18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Youngjoon
    family: Ki
  - given: Ji Won
    family: Yoon
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 76-84
  id: ki18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 76
  lastpage: 84
  published: 2018-01-07 00:00:00 +0000
- title: 'Fraud Detection with Density Estimation Trees'
  abstract: 'We consider the problem of anomaly detection in finance. An application of interest is the detection of first-time fraud where new classes of fraud need to be detected using unsupervised learning to augment the existing supervised learning techniques that capture known classes of frauds. This domain usually has the following requirements - (i) the ability to handle data containing both numerical and categorical features, (ii) very low latency real-time detection, and (iii) interpretability. We propose the use of a variant of density estimation trees (DETs) (Ram and Gray, 2011) for anomaly detection using distributional properties of the data. We formally present a procedure for handling data sets with both categorical and numerical features while Ram and Gray (2011) focused mainly on data sets with all numerical features. DETs have demonstrably fast prediction times, orders of magnitude faster than other density estimators like kernel density estimators. The estimation of the density and the anomalousness score for any new item can be done very efficiently. Beyond the flexibility and effciency, DETs are also quite interpretable. For the task of anomaly detection, DETs can generate a set of decision rules that lead to high anomalousness scores. We empirically demonstrate these capabilities on a publicly available fraud data set.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/ram18a.html
  PDF: http://proceedings.mlr.press/v71/ram18a/ram18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-ram18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Parikshit
    family: Ram
  - given: Alexander G.
    family: Gray
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 85-94
  id: ram18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 85
  lastpage: 94
  published: 2018-01-07 00:00:00 +0000
- title: 'An Automated System for Data Attribute Anomaly Detection'
  abstract: 'We introduce DataQC, an automated system for data attribute anomaly detection for the purpose of improving data quality. Large organizations can have non-standardized or inconsistent data quality checking practices being followed across different departments. The key motivation behind the development of such a system is to 1) achieve a standard for anomaly detection 2) facilitate quick identification of obvious anomalies 3) reduce human judgment in data anomaly detection 4) facilitate prompt corrective action by data scientists. Most of the methods and techniques used during the development of this system are well known and have been widely used by finance professionals who deal with data. Our contribution is to provide a system that improves overall effciency, interpretability, and objectivity for detecting data attribute anomalies.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/love18a.html
  PDF: http://proceedings.mlr.press/v71/love18a/love18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-love18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Love
  - given: Nalin
    family: Aggarwal
  - given: Alexander
    family: Statnikov
  - given: Chao
    family: Yuan
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 95-101
  id: love18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 95
  lastpage: 101
  published: 2018-01-07 00:00:00 +0000
- title: 'Binned Kernels for Anomaly Detection in Multi-timescale Data using Gaussian Processes'
  abstract: 'Financial services and technology companies invest significantly in monitoring their complex technology infrastructures to allow for quick responses to technology failures. Because of the volume and velocity of signals monitored (e.g., customer transaction volume, API calls, server CPU utilization, etc.), they require sophisticated models of normal system behavior to determine when a component falls into an anomalous state. Gaussian processes (GPs) are flexible, Bayesian nonparametric models that have successfully been used for time series forecasting, interpolation, and anomaly detection in complex data sets. Despite the growing use of GPs for time series analysis in the literature, these methods scale poorly with the size of the data. In particular, data sets containing multiple timescales can pose a problem for GPs, as they can require a large number of points for training. We describe a novel method for including long and short timescale information without including an impractical number of data points through the use of a binned process, defined as the definite integral over a latent Gaussian process. This results in a binned covariance function for the time series, which we use to fit and forecast data at multiple resolutions. The resulting models achieve higher accuracy with fewer data points than their non-binned counterparts, and are more robust to long tailed noise, heteroskedasticity, and data artifacts.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/adelsberg18a.html
  PDF: http://proceedings.mlr.press/v71/adelsberg18a/adelsberg18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-adelsberg18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Matthew
    family: Adelsberg
  - given: Christian
    family: Schwantes
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 102-113
  id: adelsberg18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 102
  lastpage: 113
  published: 2018-01-07 00:00:00 +0000
- title: 'Deep Learning to Detect Medical Treatment Fraud'
  abstract: 'Excessive treatment or testing of patients is considered one of the most ubiquitous and persistent forms of waste and abuse in healthcare. Some estimates show excessive treatment to be as high as 8% of all medical insurance provider expenditures. It is very difficult to identify an extraneous or unnecessary procedure or drug because there is such a wide variety of diagnoses and an equally large number of treatment options. Our goal in this paper was to show how RBMs can be utilized effectively to ferret out abnormal treatments where the prescribed treatment for a given diagnosis is not strictly followed. To test our hypothesis we generated 200,000 different injuries and injected 10% of the injuries with unnecessary treatments to reflect estimated industry prevalence levels. Using testing and training sets we found that Restricted Boltzmann Machines (RBMs) were able to reach AUCs of .95, lifts at 9.5 and recalls at 50%. Implementing our approach on real-world client datasets have shown performances levels that approach simulation performances despite additional noise.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/lasaga18a.html
  PDF: http://proceedings.mlr.press/v71/lasaga18a/lasaga18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-lasaga18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Lasaga
  - given: Prakash
    family: Santhana
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 114-120
  id: lasaga18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 114
  lastpage: 120
  published: 2018-01-07 00:00:00 +0000
- title: 'Sleuthing for adverse outcomes: Using anomaly detection to identify unusual behaviors of third-party agents'
  abstract: 'Business transactions between customers and financing entities are often governed by intermediary agents. In this scenario, actions taken by these agents can affect the likelihood of adverse outcomes for both the customers and the financial institution. Our goal is to establish a general framework that identifies these types of anomalous agents. In this paper, we demonstrate a novel application of anomaly detection using isolation forests to identify which agents may be associated with adverse outcomes. We apply a genetic algorithm to understand which features were key to the performance of anomaly detection and and suggest a general framework for problems that similarly concern the behaviors of third-party agents.'
  volume: 71
  URL: https://proceedings.mlr.press/v71/miller18a.html
  PDF: http://proceedings.mlr.press/v71/miller18a/miller18a.pdf
  edit: https://github.com/mlresearch//v71/edit/gh-pages/_posts/2018-01-07-miller18a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance'
  publisher: 'PMLR'
  author: 
  - given: Michelle
    family: Miller
  - given: Robert
    family: Cezeaux
  editor: 
  - given: Archana
    family: Anandakrishnan
  - given: Senthil
    family: Kumar
  - given: Alexander
    family: Statnikov
  - given: Tanveer
    family: Faruquie
  - given: Di
    family: Xu
  page: 121-125
  id: miller18a
  issued:
    date-parts: 
      - 2018
      - 1
      - 7
  firstpage: 121
  lastpage: 125
  published: 2018-01-07 00:00:00 +0000