[edit]
Binned Kernels for Anomaly Detection in Multi-timescale Data using Gaussian Processes
Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance, PMLR 71:102-113, 2018.
Abstract
Financial services and technology companies invest significantly in monitoring their complex technology infrastructures to allow for quick responses to technology failures. Because of the volume and velocity of signals monitored (e.g., customer transaction volume, API calls, server CPU utilization, etc.), they require sophisticated models of normal system behavior to determine when a component falls into an anomalous state. Gaussian processes (GPs) are flexible, Bayesian nonparametric models that have successfully been used for time series forecasting, interpolation, and anomaly detection in complex data sets. Despite the growing use of GPs for time series analysis in the literature, these methods scale poorly with the size of the data. In particular, data sets containing multiple timescales can pose a problem for GPs, as they can require a large number of points for training. We describe a novel method for including long and short timescale information without including an impractical number of data points through the use of a binned process, defined as the definite integral over a latent Gaussian process. This results in a binned covariance function for the time series, which we use to fit and forecast data at multiple resolutions. The resulting models achieve higher accuracy with fewer data points than their non-binned counterparts, and are more robust to long tailed noise, heteroskedasticity, and data artifacts.