An Automated System for Data Attribute Anomaly Detection


David Love, Nalin Aggarwal, Alexander Statnikov, Chao Yuan ;
Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance, PMLR 71:95-101, 2018.


We introduce DataQC, an automated system for data attribute anomaly detection for the purpose of improving data quality. Large organizations can have non-standardized or inconsistent data quality checking practices being followed across different departments. The key motivation behind the development of such a system is to 1) achieve a standard for anomaly detection 2) facilitate quick identification of obvious anomalies 3) reduce human judgment in data anomaly detection 4) facilitate prompt corrective action by data scientists. Most of the methods and techniques used during the development of this system are well known and have been widely used by finance professionals who deal with data. Our contribution is to provide a system that improves overall effciency, interpretability, and objectivity for detecting data attribute anomalies.

Related Material