Feature Selection using Multiple Streams

Paramveer Dhillon, Dean Foster, Lyle Ungar
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:153-160, 2010.

Abstract

Feature selection for supervised learning can be greatly improved by making use of the fact that features often come in classes. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When labeling words with senses for word sense disambiguation, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are of different sizes and have different (but unknown) fractions of good features. Experimental results show that our approach provides significant improvement in performance and is computationally less expensive than comparable “batch” methods that do not take advantage of the feature classes and expect all features to be known in advance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v9-dhillon10a, title = {Feature Selection using Multiple Streams}, author = {Dhillon, Paramveer and Foster, Dean and Ungar, Lyle}, booktitle = {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics}, pages = {153--160}, year = {2010}, editor = {Teh, Yee Whye and Titterington, Mike}, volume = {9}, series = {Proceedings of Machine Learning Research}, address = {Chia Laguna Resort, Sardinia, Italy}, month = {13--15 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v9/dhillon10a/dhillon10a.pdf}, url = { http://proceedings.mlr.press/v9/dhillon10a.html }, abstract = {Feature selection for supervised learning can be greatly improved by making use of the fact that features often come in classes. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When labeling words with senses for word sense disambiguation, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are of different sizes and have different (but unknown) fractions of good features. Experimental results show that our approach provides significant improvement in performance and is computationally less expensive than comparable “batch” methods that do not take advantage of the feature classes and expect all features to be known in advance.} }
Endnote
%0 Conference Paper %T Feature Selection using Multiple Streams %A Paramveer Dhillon %A Dean Foster %A Lyle Ungar %B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2010 %E Yee Whye Teh %E Mike Titterington %F pmlr-v9-dhillon10a %I PMLR %P 153--160 %U http://proceedings.mlr.press/v9/dhillon10a.html %V 9 %X Feature selection for supervised learning can be greatly improved by making use of the fact that features often come in classes. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When labeling words with senses for word sense disambiguation, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are of different sizes and have different (but unknown) fractions of good features. Experimental results show that our approach provides significant improvement in performance and is computationally less expensive than comparable “batch” methods that do not take advantage of the feature classes and expect all features to be known in advance.
RIS
TY - CPAPER TI - Feature Selection using Multiple Streams AU - Paramveer Dhillon AU - Dean Foster AU - Lyle Ungar BT - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics DA - 2010/03/31 ED - Yee Whye Teh ED - Mike Titterington ID - pmlr-v9-dhillon10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 9 SP - 153 EP - 160 L1 - http://proceedings.mlr.press/v9/dhillon10a/dhillon10a.pdf UR - http://proceedings.mlr.press/v9/dhillon10a.html AB - Feature selection for supervised learning can be greatly improved by making use of the fact that features often come in classes. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When labeling words with senses for word sense disambiguation, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are of different sizes and have different (but unknown) fractions of good features. Experimental results show that our approach provides significant improvement in performance and is computationally less expensive than comparable “batch” methods that do not take advantage of the feature classes and expect all features to be known in advance. ER -
APA
Dhillon, P., Foster, D. & Ungar, L.. (2010). Feature Selection using Multiple Streams. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:153-160 Available from http://proceedings.mlr.press/v9/dhillon10a.html .

Related Material