On Boosting and the Exponential Loss

Abraham J. Wyner
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:323-329, 2003.

Abstract

Boosting algorithms in general and AdaBoost in particular, initially baffled the statistical world by posing two questions: (1) Why is it that AdaBoost performs so well? and (2) What makes Boosting methods resistant to overfiting? In response to question (1) Hastie, Tibshirani and Friedman (2000) take a statistical view of Boosting by recasting it as a stagewise approach to the minimization of an exponential loss function by means of an additive model in a process similar to additive logistic regression. This characterization has since been well integrated in the statistics and computer science communities as the best statistical answer to question (1). In this paper, we argue that this well assimilated view is questionable and that perhaps Boosting’s success has nothing to do with the minimization of an exponential criterion or indeed any optimization at all. Our argument rests on a constructive theorem that states that for any sequence of classifiers there exists a linear combination for which the exponential criterion equals one. Furthermore, we present a Boosting algorithm which performs empirically like AdaBoost while stabilizing the exponential loss to a constant.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR4-wyner03a, title = {On Boosting and the Exponential Loss}, author = {Wyner, Abraham J.}, booktitle = {Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics}, pages = {323--329}, year = {2003}, editor = {Bishop, Christopher M. and Frey, Brendan J.}, volume = {R4}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r4/wyner03a/wyner03a.pdf}, url = {https://proceedings.mlr.press/r4/wyner03a.html}, abstract = {Boosting algorithms in general and AdaBoost in particular, initially baffled the statistical world by posing two questions: (1) Why is it that AdaBoost performs so well? and (2) What makes Boosting methods resistant to overfiting? In response to question (1) Hastie, Tibshirani and Friedman (2000) take a statistical view of Boosting by recasting it as a stagewise approach to the minimization of an exponential loss function by means of an additive model in a process similar to additive logistic regression. This characterization has since been well integrated in the statistics and computer science communities as the best statistical answer to question (1). In this paper, we argue that this well assimilated view is questionable and that perhaps Boosting’s success has nothing to do with the minimization of an exponential criterion or indeed any optimization at all. Our argument rests on a constructive theorem that states that for any sequence of classifiers there exists a linear combination for which the exponential criterion equals one. Furthermore, we present a Boosting algorithm which performs empirically like AdaBoost while stabilizing the exponential loss to a constant.}, note = {Reissued by PMLR on 01 April 2021.} }
Endnote
%0 Conference Paper %T On Boosting and the Exponential Loss %A Abraham J. Wyner %B Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2003 %E Christopher M. Bishop %E Brendan J. Frey %F pmlr-vR4-wyner03a %I PMLR %P 323--329 %U https://proceedings.mlr.press/r4/wyner03a.html %V R4 %X Boosting algorithms in general and AdaBoost in particular, initially baffled the statistical world by posing two questions: (1) Why is it that AdaBoost performs so well? and (2) What makes Boosting methods resistant to overfiting? In response to question (1) Hastie, Tibshirani and Friedman (2000) take a statistical view of Boosting by recasting it as a stagewise approach to the minimization of an exponential loss function by means of an additive model in a process similar to additive logistic regression. This characterization has since been well integrated in the statistics and computer science communities as the best statistical answer to question (1). In this paper, we argue that this well assimilated view is questionable and that perhaps Boosting’s success has nothing to do with the minimization of an exponential criterion or indeed any optimization at all. Our argument rests on a constructive theorem that states that for any sequence of classifiers there exists a linear combination for which the exponential criterion equals one. Furthermore, we present a Boosting algorithm which performs empirically like AdaBoost while stabilizing the exponential loss to a constant. %Z Reissued by PMLR on 01 April 2021.
APA
Wyner, A.J.. (2003). On Boosting and the Exponential Loss. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R4:323-329 Available from https://proceedings.mlr.press/r4/wyner03a.html. Reissued by PMLR on 01 April 2021.

Related Material