On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

Aditya Menon; Harikrishna Narasimhan; Shivani Agarwal; Sanjay Chawla

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

Aditya Menon, Harikrishna Narasimhan, Shivani Agarwal, Sanjay Chawla

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):603-611, 2013.

Abstract

Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem, in machine learning as well as in data mining, artificial intelligence, and various applied fields. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some simple methods that have been used in practice, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Our results employ balanced losses that have been used recently in analyses of ranking problems (Kotlowski et al., 2011) and build on recent results on consistent surrogates for cost-sensitive losses (Scott, 2012). Experimental results confirm our consistency theorems.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-menon13a,
  title = 	 {On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance},
  author = 	 {Menon, Aditya and Narasimhan, Harikrishna and Agarwal, Shivani and Chawla, Sanjay},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {603--611},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/menon13a.pdf},
  url = 	 {https://proceedings.mlr.press/v28/menon13a.html},
  abstract = 	 {Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem, in machine learning as well as in data mining, artificial intelligence, and various applied fields. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some simple methods that have been used in practice, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Our results employ balanced losses that have been used recently in analyses of ranking problems (Kotlowski et al., 2011) and build on recent results on consistent surrogates for cost-sensitive losses (Scott, 2012). Experimental results confirm our consistency theorems.  }
}

Endnote

%0 Conference Paper
%T On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance
%A Aditya Menon
%A Harikrishna Narasimhan
%A Shivani Agarwal
%A Sanjay Chawla
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-menon13a
%I PMLR
%P 603--611
%U https://proceedings.mlr.press/v28/menon13a.html
%V 28
%N 3
%X Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem, in machine learning as well as in data mining, artificial intelligence, and various applied fields. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some simple methods that have been used in practice, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Our results employ balanced losses that have been used recently in analyses of ranking problems (Kotlowski et al., 2011) and build on recent results on consistent surrogates for cost-sensitive losses (Scott, 2012). Experimental results confirm our consistency theorems.

RIS


TY  - CPAPER
TI  - On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance
AU  - Aditya Menon
AU  - Harikrishna Narasimhan
AU  - Shivani Agarwal
AU  - Sanjay Chawla
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-menon13a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 603
EP  - 611
L1  - http://proceedings.mlr.press/v28/menon13a.pdf
UR  - https://proceedings.mlr.press/v28/menon13a.html
AB  - Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem, in machine learning as well as in data mining, artificial intelligence, and various applied fields. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some simple methods that have been used in practice, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Our results employ balanced losses that have been used recently in analyses of ranking problems (Kotlowski et al., 2011) and build on recent results on consistent surrogates for cost-sensitive losses (Scott, 2012). Experimental results confirm our consistency theorems.  
ER  -

APA


Menon, A., Narasimhan, H., Agarwal, S. & Chawla, S.. (2013). On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):603-611 Available from https://proceedings.mlr.press/v28/menon13a.html.

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

Abstract

Cite this Paper

Related Material