Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data

Wei Zhang; Said Kobeissi; Scott Tomko; Chris Challis

Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data

Wei Zhang, Said Kobeissi, Scott Tomko, Chris Challis

Proceedings of the Ninth Asian Conference on Machine Learning, PMLR 77:240-247, 2017.

Abstract

Imbalanced data poses a serious challenge for many machine learning and data mining applications. It may significantly affect the performance of learning algorithms. In digital marketing applications, events of interest (positive instances for building predictive models) such as click and purchase are rare. A retail website can easily receive a million visits every day, yet only a small percentage of visits lead to purchase. The large amount of raw data and the small percentage of positive instances make it challenging to build decent predictive models in a timely fashion. In this paper, we propose an adaptive sampling strategy to deal with this problem. It efficiently returns high quality training data, ensures system responsiveness and improves predictive performances.

Cite this Paper

BibTeX


@InProceedings{pmlr-v77-zhang17b,
  title = 	 {Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data},
  author = 	 {Zhang, Wei and Kobeissi, Said and Tomko, Scott and Challis, Chris},
  booktitle = 	 {Proceedings of the Ninth Asian Conference on Machine Learning},
  pages = 	 {240--247},
  year = 	 {2017},
  editor = 	 {Zhang, Min-Ling and Noh, Yung-Kyun},
  volume = 	 {77},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Yonsei University, Seoul, Republic of Korea},
  month = 	 {15--17 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v77/zhang17b/zhang17b.pdf},
  url = 	 {https://proceedings.mlr.press/v77/zhang17b.html},
  abstract = 	 {Imbalanced data poses a serious challenge for many machine learning and data mining applications. It may significantly affect the performance of learning algorithms. In digital marketing applications, events of interest (positive instances for building predictive models) such as click and purchase are rare. A retail website can easily receive a million visits every day, yet only a small percentage of visits lead to purchase. The large amount of raw data and the small percentage of positive instances make it challenging to build decent predictive models in a timely fashion. In this paper, we propose an adaptive sampling strategy to deal with this problem. It efficiently returns high quality training data, ensures system responsiveness and improves predictive performances.}
}

Endnote

%0 Conference Paper
%T Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data
%A Wei Zhang
%A Said Kobeissi
%A Scott Tomko
%A Chris Challis
%B Proceedings of the Ninth Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Min-Ling Zhang
%E Yung-Kyun Noh	
%F pmlr-v77-zhang17b
%I PMLR
%P 240--247
%U https://proceedings.mlr.press/v77/zhang17b.html
%V 77
%X Imbalanced data poses a serious challenge for many machine learning and data mining applications. It may significantly affect the performance of learning algorithms. In digital marketing applications, events of interest (positive instances for building predictive models) such as click and purchase are rare. A retail website can easily receive a million visits every day, yet only a small percentage of visits lead to purchase. The large amount of raw data and the small percentage of positive instances make it challenging to build decent predictive models in a timely fashion. In this paper, we propose an adaptive sampling strategy to deal with this problem. It efficiently returns high quality training data, ensures system responsiveness and improves predictive performances.

APA


Zhang, W., Kobeissi, S., Tomko, S. & Challis, C.. (2017). Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data. Proceedings of the Ninth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 77:240-247 Available from https://proceedings.mlr.press/v77/zhang17b.html.

Related Material

Download PDF