A comparison of AUC estimators in small-sample studies

Antti Airola; Tapio Pahikkala; Willem Waegeman; Bernard De Baets; Tapio Salakoski

A comparison of AUC estimators in small-sample studies

Antti Airola, Tapio Pahikkala, Willem Waegeman, Bernard De Baets, Tapio Salakoski

Proceedings of the third International Workshop on Machine Learning in Systems Biology, PMLR 8:3-13, 2009.

Abstract

Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.

Cite this Paper

BibTeX


@InProceedings{pmlr-v8-airola10a,
  title = 	 {A comparison of AUC estimators in small-sample studies},
  author = 	 {Airola, Antti and Pahikkala, Tapio and Waegeman, Willem and Baets, Bernard De and Salakoski, Tapio},
  booktitle = 	 {Proceedings of the third International Workshop on Machine Learning in Systems Biology},
  pages = 	 {3--13},
  year = 	 {2009},
  editor = 	 {Džeroski, Sašo and Guerts, Pierre and Rousu, Juho},
  volume = 	 {8},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Ljubljana, Slovenia},
  month = 	 {05--06 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v8/airola10a/airola10a.pdf},
  url = 	 {https://proceedings.mlr.press/v8/airola10a.html},
  abstract = 	 {Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.}
}

Endnote

%0 Conference Paper
%T A comparison of AUC estimators in small-sample studies
%A Antti Airola
%A Tapio Pahikkala
%A Willem Waegeman
%A Bernard De Baets
%A Tapio Salakoski
%B Proceedings of the third International Workshop on Machine Learning in Systems Biology
%C Proceedings of Machine Learning Research
%D 2009
%E Sašo Džeroski
%E Pierre Guerts
%E Juho Rousu	
%F pmlr-v8-airola10a
%I PMLR
%P 3--13
%U https://proceedings.mlr.press/v8/airola10a.html
%V 8
%X Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.

RIS


TY  - CPAPER
TI  - A comparison of AUC estimators in small-sample studies
AU  - Antti Airola
AU  - Tapio Pahikkala
AU  - Willem Waegeman
AU  - Bernard De Baets
AU  - Tapio Salakoski
BT  - Proceedings of the third International Workshop on Machine Learning in Systems Biology
DA  - 2009/03/02
ED  - Sašo Džeroski
ED  - Pierre Guerts
ED  - Juho Rousu	
ID  - pmlr-v8-airola10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 8
SP  - 3
EP  - 13
L1  - http://proceedings.mlr.press/v8/airola10a/airola10a.pdf
UR  - https://proceedings.mlr.press/v8/airola10a.html
AB  - Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.
ER  -

APA


Airola, A., Pahikkala, T., Waegeman, W., Baets, B.D. & Salakoski, T.. (2009). A comparison of AUC estimators in small-sample studies. Proceedings of the third International Workshop on Machine Learning in Systems Biology, in Proceedings of Machine Learning Research 8:3-13 Available from https://proceedings.mlr.press/v8/airola10a.html.

Related Material

Download PDF