Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking

Albert Bifet; Eibe Frank; Geoffrey Holmes; Bernhard Pfahringer

Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking

Albert Bifet, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer

Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:225-240, 2010.

Abstract

The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this paper, we propose to exploit this phenomenon in the data stream context by building an ensemble of Hoeffding trees that are each limited to a small subset of attributes. In this way, each tree is restricted to model interactions between attributes in its corresponding subset. Because it is not known a priori which attribute subsets are relevant for prediction, we build exhaustive ensembles that consider all possible attribute subsets of a given size. As the resulting Hoeffding trees are not all equally important, we weigh them in a suitable manner to obtain accurate classifications. This is done by combining the log-odds of their probability estimates using sigmoid perceptrons, with one perceptron per class. We propose a mechanism for setting the perceptrons’ learning rate using the ADWIN change detection method for data streams, and also use ADWIN to reset ensemble members (i.e. Hoeffding trees) when they no longer perform well. Our experiments show that the resulting ensemble classifier outperforms bagging for data streams in terms of accuracy when both are used in conjunction with adaptive naive Bayes Hoeffding trees, at the expense of runtime and memory consumption.

Cite this Paper

BibTeX


@InProceedings{pmlr-v13-bifet10a,
  title = 	 {Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking},
  author = 	 {Bifet, Albert and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard},
  booktitle = 	 {Proceedings of 2nd Asian Conference on Machine Learning},
  pages = 	 {225--240},
  year = 	 {2010},
  editor = 	 {Sugiyama, Masashi and Yang, Qiang},
  volume = 	 {13},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Tokyo, Japan},
  month = 	 {08--10 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v13/bifet10a/bifet10a.pdf},
  url = 	 {https://proceedings.mlr.press/v13/bifet10a.html},
  abstract = 	 {The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this paper, we propose to exploit this phenomenon in the data stream context by building an ensemble of Hoeffding trees that are each limited to a small subset of attributes. In this way, each tree is restricted to model interactions between attributes in its corresponding subset. Because it is not known a priori which attribute subsets are relevant for prediction, we build exhaustive ensembles that consider all possible attribute subsets of a given size. As the resulting Hoeffding trees are not all equally important, we weigh them in a suitable manner to obtain accurate classifications. This is done by combining the log-odds of their probability estimates using sigmoid perceptrons, with one perceptron per class. We propose a mechanism for setting the perceptrons’ learning rate using the ADWIN change detection method for data streams, and also use ADWIN to reset ensemble members (i.e. Hoeffding trees) when they no longer perform well. Our experiments show that the resulting ensemble classifier outperforms bagging for data streams in terms of accuracy when both are used in conjunction with adaptive naive Bayes Hoeffding trees, at the expense of runtime and memory consumption.}
}

Endnote

%0 Conference Paper
%T Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking
%A Albert Bifet
%A Eibe Frank
%A Geoffrey Holmes
%A Bernhard Pfahringer
%B Proceedings of 2nd Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2010
%E Masashi Sugiyama
%E Qiang Yang	
%F pmlr-v13-bifet10a
%I PMLR
%P 225--240
%U https://proceedings.mlr.press/v13/bifet10a.html
%V 13
%X The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this paper, we propose to exploit this phenomenon in the data stream context by building an ensemble of Hoeffding trees that are each limited to a small subset of attributes. In this way, each tree is restricted to model interactions between attributes in its corresponding subset. Because it is not known a priori which attribute subsets are relevant for prediction, we build exhaustive ensembles that consider all possible attribute subsets of a given size. As the resulting Hoeffding trees are not all equally important, we weigh them in a suitable manner to obtain accurate classifications. This is done by combining the log-odds of their probability estimates using sigmoid perceptrons, with one perceptron per class. We propose a mechanism for setting the perceptrons’ learning rate using the ADWIN change detection method for data streams, and also use ADWIN to reset ensemble members (i.e. Hoeffding trees) when they no longer perform well. Our experiments show that the resulting ensemble classifier outperforms bagging for data streams in terms of accuracy when both are used in conjunction with adaptive naive Bayes Hoeffding trees, at the expense of runtime and memory consumption.

RIS


TY  - CPAPER
TI  - Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking
AU  - Albert Bifet
AU  - Eibe Frank
AU  - Geoffrey Holmes
AU  - Bernhard Pfahringer
BT  - Proceedings of 2nd Asian Conference on Machine Learning
DA  - 2010/10/31
ED  - Masashi Sugiyama
ED  - Qiang Yang	
ID  - pmlr-v13-bifet10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 13
SP  - 225
EP  - 240
L1  - http://proceedings.mlr.press/v13/bifet10a/bifet10a.pdf
UR  - https://proceedings.mlr.press/v13/bifet10a.html
AB  - The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this paper, we propose to exploit this phenomenon in the data stream context by building an ensemble of Hoeffding trees that are each limited to a small subset of attributes. In this way, each tree is restricted to model interactions between attributes in its corresponding subset. Because it is not known a priori which attribute subsets are relevant for prediction, we build exhaustive ensembles that consider all possible attribute subsets of a given size. As the resulting Hoeffding trees are not all equally important, we weigh them in a suitable manner to obtain accurate classifications. This is done by combining the log-odds of their probability estimates using sigmoid perceptrons, with one perceptron per class. We propose a mechanism for setting the perceptrons’ learning rate using the ADWIN change detection method for data streams, and also use ADWIN to reset ensemble members (i.e. Hoeffding trees) when they no longer perform well. Our experiments show that the resulting ensemble classifier outperforms bagging for data streams in terms of accuracy when both are used in conjunction with adaptive naive Bayes Hoeffding trees, at the expense of runtime and memory consumption.
ER  -

APA


Bifet, A., Frank, E., Holmes, G. & Pfahringer, B.. (2010). Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:225-240 Available from https://proceedings.mlr.press/v13/bifet10a.html.

Related Material

Download PDF