The Rate of Convergence of Adaboost

Indraneel Mukherjee, Cynthia Rudin, Robert E. Schapire
Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:537-558, 2011.

Abstract

The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine many “weak” hypotheses that perform slightly better than a random guess into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss” with a fast rate of convergence. Our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Specifically, our first result shows that at iteration t, the exponential loss of AdaBoost’s computed parameter vector will be at most ε more than that of any parameter vector of 1-norm bounded by B in a number of rounds that is bounded by a polynomial in B and 1/ε. We also provide rate lower bound examples showing a polynomial dependence on these parameters is necessary. Our second result is that within C/ε iterations, AdaBoost achieves a value of the exponential loss that is at most ε more than the best possible value, where C depends on the dataset. We show that this dependence of the rate on ε is optimal up to constant factors, i.e. at least Ω(1/ε) rounds are necessary to achieve within ε of the optimal exponential loss.

Cite this Paper


BibTeX
@InProceedings{pmlr-v19-mukherjee11a, title = {The Rate of Convergence of Adaboost}, author = {Mukherjee, Indraneel and Rudin, Cynthia and Schapire, Robert E.}, booktitle = {Proceedings of the 24th Annual Conference on Learning Theory}, pages = {537--558}, year = {2011}, editor = {Kakade, Sham M. and von Luxburg, Ulrike}, volume = {19}, series = {Proceedings of Machine Learning Research}, address = {Budapest, Hungary}, month = {09--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v19/mukherjee11a/mukherjee11a.pdf}, url = {https://proceedings.mlr.press/v19/mukherjee11a.html}, abstract = {The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine many “weak” hypotheses that perform slightly better than a random guess into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss” with a fast rate of convergence. Our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Specifically, our first result shows that at iteration $t$, the exponential loss of AdaBoost’s computed parameter vector will be at most $\varepsilon$ more than that of any parameter vector of $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded by a polynomial in $B$ and $1/\varepsilon$. We also provide rate lower bound examples showing a polynomial dependence on these parameters is necessary. Our second result is that within $C/\varepsilon$ iterations, AdaBoost achieves a value of the exponential loss that is at most $\varepsilon$ more than the best possible value, where $C$ depends on the dataset. We show that this dependence of the rate on $\varepsilon$ is optimal up to constant factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary to achieve within $\varepsilon$ of the optimal exponential loss.} }
Endnote
%0 Conference Paper %T The Rate of Convergence of Adaboost %A Indraneel Mukherjee %A Cynthia Rudin %A Robert E. Schapire %B Proceedings of the 24th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2011 %E Sham M. Kakade %E Ulrike von Luxburg %F pmlr-v19-mukherjee11a %I PMLR %P 537--558 %U https://proceedings.mlr.press/v19/mukherjee11a.html %V 19 %X The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine many “weak” hypotheses that perform slightly better than a random guess into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss” with a fast rate of convergence. Our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Specifically, our first result shows that at iteration $t$, the exponential loss of AdaBoost’s computed parameter vector will be at most $\varepsilon$ more than that of any parameter vector of $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded by a polynomial in $B$ and $1/\varepsilon$. We also provide rate lower bound examples showing a polynomial dependence on these parameters is necessary. Our second result is that within $C/\varepsilon$ iterations, AdaBoost achieves a value of the exponential loss that is at most $\varepsilon$ more than the best possible value, where $C$ depends on the dataset. We show that this dependence of the rate on $\varepsilon$ is optimal up to constant factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary to achieve within $\varepsilon$ of the optimal exponential loss.
RIS
TY - CPAPER TI - The Rate of Convergence of Adaboost AU - Indraneel Mukherjee AU - Cynthia Rudin AU - Robert E. Schapire BT - Proceedings of the 24th Annual Conference on Learning Theory DA - 2011/12/21 ED - Sham M. Kakade ED - Ulrike von Luxburg ID - pmlr-v19-mukherjee11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 19 SP - 537 EP - 558 L1 - http://proceedings.mlr.press/v19/mukherjee11a/mukherjee11a.pdf UR - https://proceedings.mlr.press/v19/mukherjee11a.html AB - The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine many “weak” hypotheses that perform slightly better than a random guess into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss” with a fast rate of convergence. Our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Specifically, our first result shows that at iteration $t$, the exponential loss of AdaBoost’s computed parameter vector will be at most $\varepsilon$ more than that of any parameter vector of $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded by a polynomial in $B$ and $1/\varepsilon$. We also provide rate lower bound examples showing a polynomial dependence on these parameters is necessary. Our second result is that within $C/\varepsilon$ iterations, AdaBoost achieves a value of the exponential loss that is at most $\varepsilon$ more than the best possible value, where $C$ depends on the dataset. We show that this dependence of the rate on $\varepsilon$ is optimal up to constant factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary to achieve within $\varepsilon$ of the optimal exponential loss. ER -
APA
Mukherjee, I., Rudin, C. & Schapire, R.E.. (2011). The Rate of Convergence of Adaboost. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:537-558 Available from https://proceedings.mlr.press/v19/mukherjee11a.html.

Related Material