Multiple-source cross-validation

Krzysztof Geras, Charles Sutton
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1292-1300, 2013.

Abstract

Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-geras13, title = {Multiple-source cross-validation}, author = {Geras, Krzysztof and Sutton, Charles}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {1292--1300}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/geras13.pdf}, url = {https://proceedings.mlr.press/v28/geras13.html}, abstract = {Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.} }
Endnote
%0 Conference Paper %T Multiple-source cross-validation %A Krzysztof Geras %A Charles Sutton %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-geras13 %I PMLR %P 1292--1300 %U https://proceedings.mlr.press/v28/geras13.html %V 28 %N 3 %X Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.
RIS
TY - CPAPER TI - Multiple-source cross-validation AU - Krzysztof Geras AU - Charles Sutton BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-geras13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 1292 EP - 1300 L1 - http://proceedings.mlr.press/v28/geras13.pdf UR - https://proceedings.mlr.press/v28/geras13.html AB - Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms. ER -
APA
Geras, K. & Sutton, C.. (2013). Multiple-source cross-validation. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1292-1300 Available from https://proceedings.mlr.press/v28/geras13.html.

Related Material