Detecting Accounting Frauds in Publicly Traded U.S. Firms: A Machine Learning Approach
Asian Conference on Machine Learning, PMLR 45:173-188, 2016.
This paper studies how machine learning techniques can facilitate the detection of accounting fraud in publicly traded US firms. Existing studies often mimic human experts and employ the financial or nonfinancial ratios as the features for their systems. We depart from these studies by adopting raw accounting variables, which are directly available from a firm’s financial statement and thereby can be easily applied to new firms at low cost. Further, we collected the most complete fraud dataset of US publicly traded firms and labeled the fraud and non-fraud firm-years. One key issue of the dataset is that the data is extremely imbalanced, in which the fraud firm-years are often less than one percent. Without re-sampling the data, we further propose to tackle the imbalance issue by adopting the techniques of imbalanced learning. In particular, we employ the linear and nonlinear Biased Penalty Support Vector Machine and the Ensemble Methods, both of which have been proved to successfully handle the imbalance issue in the machine learning literatures. We finally evaluate our approach by conducting extensive empirical studies. Empirical results show that the proposed schema can achieve much better performance, in terms of balanced accuracy, than the state of the art. Besides the performance, our approaches can also compute very fast, which further supports their practical deployment.