Look before you leap: Some insights into learner evaluation with cross-validation

Gitte Vanwinckelen; Hendrik Blockeel

Look before you leap: Some insights into learner evaluation with cross-validation

Gitte Vanwinckelen, Hendrik Blockeel

Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD, PMLR 47:3-20, 2015.

Abstract

Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v47-vanwinckelen14a,
  title = 	 {Look before you leap: Some insights into learner evaluation with cross-validation},
  author = 	 {Vanwinckelen, Gitte and Blockeel, Hendrik},
  booktitle = 	 {Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD},
  pages = 	 {3--20},
  year = 	 {2015},
  editor = 	 {Hämäläinen, Wilhelmiina and Petitjean, François and Webb, I.},
  volume = 	 {47},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Nancy, France},
  month = 	 {15 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v47/vanwinckelen14a.pdf},
  url = 	 {https://proceedings.mlr.press/v47/vanwinckelen14a.html},
  abstract = 	 {Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results. }
}

Endnote

%0 Conference Paper
%T Look before you leap: Some insights into learner evaluation with cross-validation
%A Gitte Vanwinckelen
%A Hendrik Blockeel
%B Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD
%C Proceedings of Machine Learning Research
%D 2015
%E Wilhelmiina Hämäläinen
%E François Petitjean
%E I. Webb	
%F pmlr-v47-vanwinckelen14a
%I PMLR
%P 3--20
%U https://proceedings.mlr.press/v47/vanwinckelen14a.html
%V 47
%X Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results.

RIS


TY  - CPAPER
TI  - Look before you leap: Some insights into learner evaluation with cross-validation
AU  - Gitte Vanwinckelen
AU  - Hendrik Blockeel
BT  - Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD
DA  - 2015/11/27
ED  - Wilhelmiina Hämäläinen
ED  - François Petitjean
ED  - I. Webb	
ID  - pmlr-v47-vanwinckelen14a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 47
SP  - 3
EP  - 20
L1  - http://proceedings.mlr.press/v47/vanwinckelen14a.pdf
UR  - https://proceedings.mlr.press/v47/vanwinckelen14a.html
AB  - Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results. 
ER  -

APA


Vanwinckelen, G. & Blockeel, H.. (2015). Look before you leap: Some insights into learner evaluation with cross-validation. Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD, in Proceedings of Machine Learning Research 47:3-20 Available from https://proceedings.mlr.press/v47/vanwinckelen14a.html.

Related Material

Download PDF