Multiple-source cross-validation

Krzysztof Geras; Charles Sutton

Multiple-source cross-validation

Krzysztof Geras, Charles Sutton

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1292-1300, 2013.

Abstract

Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-geras13,
  title = 	 {Multiple-source cross-validation},
  author = 	 {Geras, Krzysztof and Sutton, Charles},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {1292--1300},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/geras13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/geras13.html},
  abstract = 	 {Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.}
}

Endnote

%0 Conference Paper
%T Multiple-source cross-validation
%A Krzysztof Geras
%A Charles Sutton
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-geras13
%I PMLR
%P 1292--1300
%U https://proceedings.mlr.press/v28/geras13.html
%V 28
%N 3
%X Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.

RIS


TY  - CPAPER
TI  - Multiple-source cross-validation
AU  - Krzysztof Geras
AU  - Charles Sutton
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-geras13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 1292
EP  - 1300
L1  - http://proceedings.mlr.press/v28/geras13.pdf
UR  - https://proceedings.mlr.press/v28/geras13.html
AB  - Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.
ER  -

APA


Geras, K. & Sutton, C.. (2013). Multiple-source cross-validation. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1292-1300 Available from https://proceedings.mlr.press/v28/geras13.html.

Related Material

Download PDF