GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare

Arpit Agarwal; Harikrishna Narasimhan; Shivaram Kalyanakrishnan; Shivani Agarwal

GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare

Arpit Agarwal, Harikrishna Narasimhan, Shivaram Kalyanakrishnan, Shivani Agarwal

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1989-1997, 2014.

Abstract

We consider the problem of binary class probability estimation (CPE) when one class is rare compared to the other. It is well known that standard algorithms such as logistic regression do not perform well on this task as they tend to under-estimate the probability of the rare class. Common fixes include under-sampling and weighting, together with various correction schemes. Recently, Wang & Dey (2010) suggested the use of a parametrized family of asymmetric link functions based on the generalized extreme value (GEV) distribution, which has been used for modeling rare events in statistics. The approach showed promising initial results, but combined with the logarithmic CPE loss implicitly used in their work, it results in a non-convex composite loss that is difficult to optimize. In this paper, we use tools from the theory of proper composite losses (Buja et al, 2005; Reid & Williamson, 2010) to construct a canonical underlying CPE loss corresponding to the GEV link, which yields a convex proper composite loss that we call the GEV-canonical loss; this loss is tailored for the task of CPE when one class is rare, and is easy to minimize using an IRLS-type algorithm similar to that used for logistic regression. Our experiments on both synthetic and real data demonstrate that the resulting algorithm – which we term GEV-canonical regression – outperforms common approaches such as under-sampling and weights correction for this problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-agarwalc14,
  title = 	 {GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare},
  author = 	 {Agarwal, Arpit and Narasimhan, Harikrishna and Kalyanakrishnan, Shivaram and Agarwal, Shivani},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1989--1997},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/agarwalc14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/agarwalc14.html},
  abstract = 	 {We consider the problem of binary class probability estimation (CPE) when one class is rare compared to the other. It is well known that standard algorithms such as logistic regression do not perform well on this task as they tend to under-estimate the probability of the rare class. Common fixes include under-sampling and weighting, together with various correction schemes. Recently, Wang & Dey (2010) suggested the use of a parametrized family of asymmetric link functions based on the generalized extreme value (GEV) distribution, which has been used for modeling rare events in statistics. The approach showed promising initial results, but combined with the logarithmic CPE loss implicitly used in their work, it results in a non-convex composite loss that is difficult to optimize. In this paper, we use tools from the theory of proper composite losses (Buja et al, 2005; Reid & Williamson, 2010) to construct a canonical underlying CPE loss corresponding to the GEV link, which yields a convex proper composite loss that we call the GEV-canonical loss; this loss is tailored for the task of CPE when one class is rare, and is easy to minimize using an IRLS-type algorithm similar to that used for logistic regression. Our experiments on both synthetic and real data demonstrate that the resulting algorithm – which we term GEV-canonical regression – outperforms common approaches such as under-sampling and weights correction for this problem.}
}

Endnote

%0 Conference Paper
%T GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare
%A Arpit Agarwal
%A Harikrishna Narasimhan
%A Shivaram Kalyanakrishnan
%A Shivani Agarwal
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-agarwalc14
%I PMLR
%P 1989--1997
%U https://proceedings.mlr.press/v32/agarwalc14.html
%V 32
%N 2
%X We consider the problem of binary class probability estimation (CPE) when one class is rare compared to the other. It is well known that standard algorithms such as logistic regression do not perform well on this task as they tend to under-estimate the probability of the rare class. Common fixes include under-sampling and weighting, together with various correction schemes. Recently, Wang & Dey (2010) suggested the use of a parametrized family of asymmetric link functions based on the generalized extreme value (GEV) distribution, which has been used for modeling rare events in statistics. The approach showed promising initial results, but combined with the logarithmic CPE loss implicitly used in their work, it results in a non-convex composite loss that is difficult to optimize. In this paper, we use tools from the theory of proper composite losses (Buja et al, 2005; Reid & Williamson, 2010) to construct a canonical underlying CPE loss corresponding to the GEV link, which yields a convex proper composite loss that we call the GEV-canonical loss; this loss is tailored for the task of CPE when one class is rare, and is easy to minimize using an IRLS-type algorithm similar to that used for logistic regression. Our experiments on both synthetic and real data demonstrate that the resulting algorithm – which we term GEV-canonical regression – outperforms common approaches such as under-sampling and weights correction for this problem.

RIS


TY  - CPAPER
TI  - GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare
AU  - Arpit Agarwal
AU  - Harikrishna Narasimhan
AU  - Shivaram Kalyanakrishnan
AU  - Shivani Agarwal
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-agarwalc14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1989
EP  - 1997
L1  - http://proceedings.mlr.press/v32/agarwalc14.pdf
UR  - https://proceedings.mlr.press/v32/agarwalc14.html
AB  - We consider the problem of binary class probability estimation (CPE) when one class is rare compared to the other. It is well known that standard algorithms such as logistic regression do not perform well on this task as they tend to under-estimate the probability of the rare class. Common fixes include under-sampling and weighting, together with various correction schemes. Recently, Wang & Dey (2010) suggested the use of a parametrized family of asymmetric link functions based on the generalized extreme value (GEV) distribution, which has been used for modeling rare events in statistics. The approach showed promising initial results, but combined with the logarithmic CPE loss implicitly used in their work, it results in a non-convex composite loss that is difficult to optimize. In this paper, we use tools from the theory of proper composite losses (Buja et al, 2005; Reid & Williamson, 2010) to construct a canonical underlying CPE loss corresponding to the GEV link, which yields a convex proper composite loss that we call the GEV-canonical loss; this loss is tailored for the task of CPE when one class is rare, and is easy to minimize using an IRLS-type algorithm similar to that used for logistic regression. Our experiments on both synthetic and real data demonstrate that the resulting algorithm – which we term GEV-canonical regression – outperforms common approaches such as under-sampling and weights correction for this problem.
ER  -

APA


Agarwal, A., Narasimhan, H., Kalyanakrishnan, S. & Agarwal, S.. (2014). GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1989-1997 Available from https://proceedings.mlr.press/v32/agarwalc14.html.

GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare

Abstract

Cite this Paper

Related Material