[edit]
The Hidden Cost of Fraud: An Instance-Dependent Cost-Sensitive Approach for Positive and Unlabeled Learning
Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 183:53-67, 2022.
Abstract
Financial institutions have increasingly suffered pressure to implement better and faster fraud detection systems to minimize the cost of fraud. This issue has attracted attention from the literature over recent years. Despite the practical relevance, few works have considered label uncertainty in fraud detection. The incomplete label information naturally arises because fraudsters strive to go undetected. Most fraud detection systems operate by spending more resources to investigate only few suspicious cases and quickly process the rest as unsuspicious. That is, we only have positive label information of some fraudsters whereas the rest of the positives, together with legitimate non-fraudsters, remain unlabeled. This setting is referred to as learning from positive and unlabeled data, or PU learning. Besides the issue of undetected fraudsters, fraud detection is commonly regarded as a cost-sensitive classification task in which the misclassification cost can substantially vary between examples. Thus, this work introduces a novel technique that integrates PU learning and the instance-dependent cost-sensitive framework: PU-CSBoost. PU-CSBoost can directly minimize financial loss through an instance-dependent cost measure that also incorporates the misclassification cost due to hidden fraudsters. Our empirical analysis compares PU-CSBoost with CSBoost, its non-PU counterpart, and other PU techniques specialized in imbalanced learning. The experimental results emphasize the PU-CSBoost's potential to diminish financial losses under the PU setting. Moreover, the results suggest a quick drop in cost-sensitive performance by CSBoost when hidden fraudsters are present. Thus, ignoring the issue of hidden fraudsters can lead to an underwhelming performance in cost savings for techniques based on the cost-sensitive framework.