Loss-Proportional Subsampling for Subsequent ERM

Paul Mineiro, Nikos Karampatziakis
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):522-530, 2013.

Abstract

We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-mineiro13, title = {Loss-Proportional Subsampling for Subsequent ERM}, author = {Mineiro, Paul and Karampatziakis, Nikos}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {522--530}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/mineiro13.pdf}, url = {https://proceedings.mlr.press/v28/mineiro13.html}, abstract = {We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees.} }
Endnote
%0 Conference Paper %T Loss-Proportional Subsampling for Subsequent ERM %A Paul Mineiro %A Nikos Karampatziakis %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-mineiro13 %I PMLR %P 522--530 %U https://proceedings.mlr.press/v28/mineiro13.html %V 28 %N 3 %X We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees.
RIS
TY - CPAPER TI - Loss-Proportional Subsampling for Subsequent ERM AU - Paul Mineiro AU - Nikos Karampatziakis BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-mineiro13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 522 EP - 530 L1 - http://proceedings.mlr.press/v28/mineiro13.pdf UR - https://proceedings.mlr.press/v28/mineiro13.html AB - We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees. ER -
APA
Mineiro, P. & Karampatziakis, N.. (2013). Loss-Proportional Subsampling for Subsequent ERM. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):522-530 Available from https://proceedings.mlr.press/v28/mineiro13.html.

Related Material