Submodularity in Data Subset Selection and Active Learning

Kai Wei; Rishabh Iyer; Jeff Bilmes

Submodularity in Data Subset Selection and Active Learning

Kai Wei, Rishabh Iyer, Jeff Bilmes

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1954-1963, 2015.

Abstract

We study the problem of selecting a subset of big data to train a classifier while incurring minimal performance loss. We show the connection of submodularity to the data likelihood functions for Naive Bayes (NB) and Nearest Neighbor (NN) classifiers, and formulate the data subset selection problems for these classifiers as constrained submodular maximization. Furthermore, we apply this framework to active learning and propose a novel scheme filtering active submodular selection (FASS), where we combine the uncertainty sampling method with a submodular data subset selection framework. We extensively evaluate the proposed framework on text categorization and handwritten digit recognition tasks with four different classifiers, including Deep Neural Network (DNN) based classifiers. Empirical results indicate that the proposed framework yields significant improvement over the state-of-the-art algorithms on all classifiers.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-wei15,
  title = 	 {Submodularity in Data Subset Selection and Active Learning},
  author = 	 {Wei, Kai and Iyer, Rishabh and Bilmes, Jeff},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1954--1963},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/wei15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/wei15.html},
  abstract = 	 {We study the problem of selecting a subset of big data to train a classifier while incurring minimal performance loss. We show the connection of submodularity to the data likelihood functions for Naive Bayes (NB) and Nearest Neighbor (NN) classifiers, and formulate the data subset selection problems for these classifiers as constrained submodular maximization. Furthermore, we apply this framework to active learning and propose a novel scheme filtering active submodular selection (FASS), where we combine the uncertainty sampling method with a submodular data subset selection framework. We extensively evaluate the proposed framework on text categorization and handwritten digit recognition tasks with four different classifiers, including Deep Neural Network (DNN) based classifiers. Empirical results indicate that the proposed framework yields significant improvement over the state-of-the-art algorithms on all classifiers.}
}

Endnote

%0 Conference Paper
%T Submodularity in Data Subset Selection and Active Learning
%A Kai Wei
%A Rishabh Iyer
%A Jeff Bilmes
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-wei15
%I PMLR
%P 1954--1963
%U https://proceedings.mlr.press/v37/wei15.html
%V 37
%X We study the problem of selecting a subset of big data to train a classifier while incurring minimal performance loss. We show the connection of submodularity to the data likelihood functions for Naive Bayes (NB) and Nearest Neighbor (NN) classifiers, and formulate the data subset selection problems for these classifiers as constrained submodular maximization. Furthermore, we apply this framework to active learning and propose a novel scheme filtering active submodular selection (FASS), where we combine the uncertainty sampling method with a submodular data subset selection framework. We extensively evaluate the proposed framework on text categorization and handwritten digit recognition tasks with four different classifiers, including Deep Neural Network (DNN) based classifiers. Empirical results indicate that the proposed framework yields significant improvement over the state-of-the-art algorithms on all classifiers.

RIS


TY  - CPAPER
TI  - Submodularity in Data Subset Selection and Active Learning
AU  - Kai Wei
AU  - Rishabh Iyer
AU  - Jeff Bilmes
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-wei15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1954
EP  - 1963
L1  - http://proceedings.mlr.press/v37/wei15.pdf
UR  - https://proceedings.mlr.press/v37/wei15.html
AB  - We study the problem of selecting a subset of big data to train a classifier while incurring minimal performance loss. We show the connection of submodularity to the data likelihood functions for Naive Bayes (NB) and Nearest Neighbor (NN) classifiers, and formulate the data subset selection problems for these classifiers as constrained submodular maximization. Furthermore, we apply this framework to active learning and propose a novel scheme filtering active submodular selection (FASS), where we combine the uncertainty sampling method with a submodular data subset selection framework. We extensively evaluate the proposed framework on text categorization and handwritten digit recognition tasks with four different classifiers, including Deep Neural Network (DNN) based classifiers. Empirical results indicate that the proposed framework yields significant improvement over the state-of-the-art algorithms on all classifiers.
ER  -

APA


Wei, K., Iyer, R. & Bilmes, J.. (2015). Submodularity in Data Subset Selection and Active Learning. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1954-1963 Available from https://proceedings.mlr.press/v37/wei15.html.

Submodularity in Data Subset Selection and Active Learning

Abstract

Cite this Paper

Related Material