Which method learns most from the data? Methodological Issues in the Analysis of Comparative Studies

A. J. Feelders, W. J. H. Verkooijen
Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, PMLR R0:219-225, 1995.

Abstract

The mutual discovery of the statistical and artificial intelligence communities (see e.g. [Han93, CO94]) has resulted in many studies which compare the performance of statistical and machine learning methods on empirical data sets; examples are the StatLog project ([MST94]) and the Santa Fe Time Series Competition ([WG94]), as well as numerous journal articles ([KWR93, RABCK93, WHR90, TAF91, TK92, FG93]). What has struck us is the casual manner comparisons are typically carried out in the literature. The ranking of $k$ preselected methods is performed by training (estimating in statistical terminology) them on a single data set, and estimating their respective mean prediction errors (MPE) from a hold-out sample. The methods are, subsequently, ranked according to their estimated MPEs. When the total number of observations is small, usually cross-validation rather than a hold-out sample is used to estimate the mean prediction errors. A more rigourous comparison of methods should include significance testing rather than giving a mere ranking based on the estimated MPEs. The statistical analysis of comparative studies, method ranking in particular, is addressed in this paper. Specifically, we address methodological issues of studies in which the performance of several regression or classification methods is compared on empirical data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR0-feelders95a, title = {Which method learns most from the data? Methodological Issues in the Analysis of Comparative Studies}, author = {Feelders, A. J. and Verkooijen, W. J. H.}, booktitle = {Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics}, pages = {219--225}, year = {1995}, editor = {Fisher, Doug and Lenz, Hans-Joachim}, volume = {R0}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/r0/feelders95a/feelders95a.pdf}, url = {https://proceedings.mlr.press/r0/feelders95a.html}, abstract = {The mutual discovery of the statistical and artificial intelligence communities (see e.g. [Han93, CO94]) has resulted in many studies which compare the performance of statistical and machine learning methods on empirical data sets; examples are the StatLog project ([MST94]) and the Santa Fe Time Series Competition ([WG94]), as well as numerous journal articles ([KWR93, RABCK93, WHR90, TAF91, TK92, FG93]). What has struck us is the casual manner comparisons are typically carried out in the literature. The ranking of $k$ preselected methods is performed by training (estimating in statistical terminology) them on a single data set, and estimating their respective mean prediction errors (MPE) from a hold-out sample. The methods are, subsequently, ranked according to their estimated MPEs. When the total number of observations is small, usually cross-validation rather than a hold-out sample is used to estimate the mean prediction errors. A more rigourous comparison of methods should include significance testing rather than giving a mere ranking based on the estimated MPEs. The statistical analysis of comparative studies, method ranking in particular, is addressed in this paper. Specifically, we address methodological issues of studies in which the performance of several regression or classification methods is compared on empirical data sets.}, note = {Reissued by PMLR on 01 May 2022.} }
Endnote
%0 Conference Paper %T Which method learns most from the data? Methodological Issues in the Analysis of Comparative Studies %A A. J. Feelders %A W. J. H. Verkooijen %B Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1995 %E Doug Fisher %E Hans-Joachim Lenz %F pmlr-vR0-feelders95a %I PMLR %P 219--225 %U https://proceedings.mlr.press/r0/feelders95a.html %V R0 %X The mutual discovery of the statistical and artificial intelligence communities (see e.g. [Han93, CO94]) has resulted in many studies which compare the performance of statistical and machine learning methods on empirical data sets; examples are the StatLog project ([MST94]) and the Santa Fe Time Series Competition ([WG94]), as well as numerous journal articles ([KWR93, RABCK93, WHR90, TAF91, TK92, FG93]). What has struck us is the casual manner comparisons are typically carried out in the literature. The ranking of $k$ preselected methods is performed by training (estimating in statistical terminology) them on a single data set, and estimating their respective mean prediction errors (MPE) from a hold-out sample. The methods are, subsequently, ranked according to their estimated MPEs. When the total number of observations is small, usually cross-validation rather than a hold-out sample is used to estimate the mean prediction errors. A more rigourous comparison of methods should include significance testing rather than giving a mere ranking based on the estimated MPEs. The statistical analysis of comparative studies, method ranking in particular, is addressed in this paper. Specifically, we address methodological issues of studies in which the performance of several regression or classification methods is compared on empirical data sets. %Z Reissued by PMLR on 01 May 2022.
APA
Feelders, A.J. & Verkooijen, W.J.H.. (1995). Which method learns most from the data? Methodological Issues in the Analysis of Comparative Studies. Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R0:219-225 Available from https://proceedings.mlr.press/r0/feelders95a.html. Reissued by PMLR on 01 May 2022.

Related Material