Look before you leap: Some insights into learner evaluation with cross-validation

Gitte Vanwinckelen, Hendrik Blockeel
Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD, PMLR 47:3-20, 2015.

Abstract

Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v47-vanwinckelen14a, title = {Look before you leap: Some insights into learner evaluation with cross-validation}, author = {Vanwinckelen, Gitte and Blockeel, Hendrik}, booktitle = {Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD}, pages = {3--20}, year = {2015}, editor = {Hämäläinen, Wilhelmiina and Petitjean, François and Webb, I.}, volume = {47}, series = {Proceedings of Machine Learning Research}, address = {Nancy, France}, month = {15 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v47/vanwinckelen14a.pdf}, url = {https://proceedings.mlr.press/v47/vanwinckelen14a.html}, abstract = {Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results. } }
Endnote
%0 Conference Paper %T Look before you leap: Some insights into learner evaluation with cross-validation %A Gitte Vanwinckelen %A Hendrik Blockeel %B Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD %C Proceedings of Machine Learning Research %D 2015 %E Wilhelmiina Hämäläinen %E François Petitjean %E I. Webb %F pmlr-v47-vanwinckelen14a %I PMLR %P 3--20 %U https://proceedings.mlr.press/v47/vanwinckelen14a.html %V 47 %X Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results.
RIS
TY - CPAPER TI - Look before you leap: Some insights into learner evaluation with cross-validation AU - Gitte Vanwinckelen AU - Hendrik Blockeel BT - Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD DA - 2015/11/27 ED - Wilhelmiina Hämäläinen ED - François Petitjean ED - I. Webb ID - pmlr-v47-vanwinckelen14a PB - PMLR DP - Proceedings of Machine Learning Research VL - 47 SP - 3 EP - 20 L1 - http://proceedings.mlr.press/v47/vanwinckelen14a.pdf UR - https://proceedings.mlr.press/v47/vanwinckelen14a.html AB - Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results. ER -
APA
Vanwinckelen, G. & Blockeel, H.. (2015). Look before you leap: Some insights into learner evaluation with cross-validation. Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD, in Proceedings of Machine Learning Research 47:3-20 Available from https://proceedings.mlr.press/v47/vanwinckelen14a.html.

Related Material