The Rate of Convergence of Adaboost

Indraneel Mukherjee; Cynthia Rudin; Robert E. Schapire

The Rate of Convergence of Adaboost

Indraneel Mukherjee, Cynthia Rudin, Robert E. Schapire

Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:537-558, 2011.

Abstract

The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine many “weak” hypotheses that perform slightly better than a random guess into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss” with a fast rate of convergence. Our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Specifically, our first result shows that at iteration $t$, the exponential loss of AdaBoost’s computed parameter vector will be at most $\varepsilon$ more than that of any parameter vector of $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded by a polynomial in $B$ and $1/\varepsilon$. We also provide rate lower bound examples showing a polynomial dependence on these parameters is necessary. Our second result is that within $C/\varepsilon$ iterations, AdaBoost achieves a value of the exponential loss that is at most $\varepsilon$ more than the best possible value, where $C$ depends on the dataset. We show that this dependence of the rate on $\varepsilon$ is optimal up to constant factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary to achieve within $\varepsilon$ of the optimal exponential loss.

Cite this Paper

BibTeX


@InProceedings{pmlr-v19-mukherjee11a,
  title = 	 {The Rate of Convergence of Adaboost},
  author = 	 {Mukherjee, Indraneel and Rudin, Cynthia and Schapire, Robert E.},
  booktitle = 	 {Proceedings of the 24th Annual Conference on Learning Theory},
  pages = 	 {537--558},
  year = 	 {2011},
  editor = 	 {Kakade, Sham M. and von Luxburg, Ulrike},
  volume = 	 {19},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Budapest, Hungary},
  month = 	 {09--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v19/mukherjee11a/mukherjee11a.pdf},
  url = 	 {https://proceedings.mlr.press/v19/mukherjee11a.html},
  abstract = 	 {The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine  many “weak” hypotheses that perform slightly better than a random  guess into a “strong” hypothesis that has very low error.  We  study the rate at which AdaBoost iteratively converges to the  minimum of the “exponential loss” with a fast rate of  convergence. Our proofs do not require a weak-learning assumption,  nor do they require that minimizers of the exponential loss are  finite. Specifically, our first result shows that at iteration $t$,  the exponential loss of AdaBoost’s computed parameter vector will be  at most $\varepsilon$ more than that of any parameter vector of  $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded  by a polynomial in $B$ and $1/\varepsilon$. We also provide rate  lower bound examples showing a polynomial dependence on these  parameters is necessary.  Our second result is that within  $C/\varepsilon$ iterations, AdaBoost achieves a value of the  exponential loss that is at most $\varepsilon$ more than the best  possible value, where $C$ depends on the dataset. We show that this  dependence of the rate on $\varepsilon$ is optimal up to constant  factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary  to achieve within $\varepsilon$ of the optimal exponential loss.}
}

Endnote

%0 Conference Paper
%T The Rate of Convergence of Adaboost
%A Indraneel Mukherjee
%A Cynthia Rudin
%A Robert E. Schapire
%B Proceedings of the 24th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2011
%E Sham M. Kakade
%E Ulrike von Luxburg	
%F pmlr-v19-mukherjee11a
%I PMLR
%P 537--558
%U https://proceedings.mlr.press/v19/mukherjee11a.html
%V 19
%X The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine  many “weak” hypotheses that perform slightly better than a random  guess into a “strong” hypothesis that has very low error.  We  study the rate at which AdaBoost iteratively converges to the  minimum of the “exponential loss” with a fast rate of  convergence. Our proofs do not require a weak-learning assumption,  nor do they require that minimizers of the exponential loss are  finite. Specifically, our first result shows that at iteration $t$,  the exponential loss of AdaBoost’s computed parameter vector will be  at most $\varepsilon$ more than that of any parameter vector of  $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded  by a polynomial in $B$ and $1/\varepsilon$. We also provide rate  lower bound examples showing a polynomial dependence on these  parameters is necessary.  Our second result is that within  $C/\varepsilon$ iterations, AdaBoost achieves a value of the  exponential loss that is at most $\varepsilon$ more than the best  possible value, where $C$ depends on the dataset. We show that this  dependence of the rate on $\varepsilon$ is optimal up to constant  factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary  to achieve within $\varepsilon$ of the optimal exponential loss.

RIS


TY  - CPAPER
TI  - The Rate of Convergence of Adaboost
AU  - Indraneel Mukherjee
AU  - Cynthia Rudin
AU  - Robert E. Schapire
BT  - Proceedings of the 24th Annual Conference on Learning Theory
DA  - 2011/12/21
ED  - Sham M. Kakade
ED  - Ulrike von Luxburg	
ID  - pmlr-v19-mukherjee11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 19
SP  - 537
EP  - 558
L1  - http://proceedings.mlr.press/v19/mukherjee11a/mukherjee11a.pdf
UR  - https://proceedings.mlr.press/v19/mukherjee11a.html
AB  - The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine  many “weak” hypotheses that perform slightly better than a random  guess into a “strong” hypothesis that has very low error.  We  study the rate at which AdaBoost iteratively converges to the  minimum of the “exponential loss” with a fast rate of  convergence. Our proofs do not require a weak-learning assumption,  nor do they require that minimizers of the exponential loss are  finite. Specifically, our first result shows that at iteration $t$,  the exponential loss of AdaBoost’s computed parameter vector will be  at most $\varepsilon$ more than that of any parameter vector of  $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded  by a polynomial in $B$ and $1/\varepsilon$. We also provide rate  lower bound examples showing a polynomial dependence on these  parameters is necessary.  Our second result is that within  $C/\varepsilon$ iterations, AdaBoost achieves a value of the  exponential loss that is at most $\varepsilon$ more than the best  possible value, where $C$ depends on the dataset. We show that this  dependence of the rate on $\varepsilon$ is optimal up to constant  factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary  to achieve within $\varepsilon$ of the optimal exponential loss.
ER  -

APA


Mukherjee, I., Rudin, C. & Schapire, R.E.. (2011). The Rate of Convergence of Adaboost. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:537-558 Available from https://proceedings.mlr.press/v19/mukherjee11a.html.

Related Material

Download PDF