Online Multiclass Boosting with Bandit Feedback

Daniel T. Zhang; Young Hun Jung; Ambuj Tewari

Online Multiclass Boosting with Bandit Feedback

Daniel T. Zhang, Young Hun Jung, Ambuj Tewari

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1148-1156, 2019.

Abstract

We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-zhang19e,
  title = 	 {Online Multiclass Boosting with Bandit Feedback},
  author =       {Zhang, Daniel T. and Jung, Young Hun and Tewari, Ambuj},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1148--1156},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/zhang19e/zhang19e.pdf},
  url = 	 {https://proceedings.mlr.press/v89/zhang19e.html},
  abstract = 	 {We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners.}
}

Endnote

%0 Conference Paper
%T Online Multiclass Boosting with Bandit Feedback
%A Daniel T. Zhang
%A Young Hun Jung
%A Ambuj Tewari
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-zhang19e
%I PMLR
%P 1148--1156
%U https://proceedings.mlr.press/v89/zhang19e.html
%V 89
%X We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners.

APA


Zhang, D.T., Jung, Y.H. & Tewari, A.. (2019). Online Multiclass Boosting with Bandit Feedback. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1148-1156 Available from https://proceedings.mlr.press/v89/zhang19e.html.

Online Multiclass Boosting with Bandit Feedback

Abstract

Cite this Paper

Related Material