Online Linear Classification with Massart Noise

Ilias Diakonikolas; Vasilis Kontonis; Christos Tzamos; Nikos Zarifis

Online Linear Classification with Massart Noise

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:13679-13692, 2025.

Abstract

We study the task of online learning in the presence of Massart noise. Specifically, instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\boldsymbol{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\boldsymbol{x}$ with unknown probability at most $\eta$. We focus on the fundamental class of $\gamma$-margin linear classifiers and present the first computationally efficient algorithm that achieves mistake bound $\eta T + o(T)$. We point out that the mistake bound achieved by our algorithm is qualitatively tight for computationally efficient algorithms; this follows from the fact that, even in the offline setting, achieving 0-1 error better than $\eta$ requires super-polynomial time under standard complexity assumptions. We extend our online learning model to a $k$-arm contextual bandit setting where the rewards—instead of satisfying commonly used realizability assumptions—are consistent, in expectation, with some linear ranking function with weight vector $\boldsymbol{w}^\ast$. Given a list of contexts $\boldsymbol{x}_1,\ldots \boldsymbol{x}_k$, if $\boldsymbol{w}^*\cdot \boldsymbol{x}_i > \boldsymbol{w}^* \cdot \boldsymbol{x}_j$, the expected reward of action $i$ must be larger than that of $j$ by at least $\Delta$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) \Delta T - o(T)$ bigger than choosing a random action at every round.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-diakonikolas25e,
  title = 	 {Online Linear Classification with Massart Noise},
  author =       {Diakonikolas, Ilias and Kontonis, Vasilis and Tzamos, Christos and Zarifis, Nikos},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {13679--13692},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/diakonikolas25e/diakonikolas25e.pdf},
  url = 	 {https://proceedings.mlr.press/v267/diakonikolas25e.html},
  abstract = 	 {We study the task of online learning in the presence of Massart noise. Specifically, instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\boldsymbol{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\boldsymbol{x}$ with unknown probability at most $\eta$. We focus on the fundamental class of $\gamma$-margin linear classifiers and present the first computationally efficient algorithm that achieves mistake bound $\eta T + o(T)$. We point out that the mistake bound achieved by our algorithm is qualitatively tight for computationally efficient algorithms; this follows from the fact that, even in the offline setting, achieving 0-1 error better than $\eta$ requires super-polynomial time under standard complexity assumptions. We extend our online learning model to a $k$-arm contextual bandit setting where the rewards—instead of satisfying commonly used realizability assumptions—are consistent, in expectation, with some linear ranking function with weight vector $\boldsymbol{w}^\ast$. Given a list of contexts $\boldsymbol{x}_1,\ldots \boldsymbol{x}_k$, if $\boldsymbol{w}^*\cdot \boldsymbol{x}_i > \boldsymbol{w}^* \cdot \boldsymbol{x}_j$, the expected reward of action $i$ must be larger than that of $j$ by at least $\Delta$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) \Delta T - o(T)$ bigger than choosing a random action at every round.}
}

Endnote

%0 Conference Paper
%T Online Linear Classification with Massart Noise
%A Ilias Diakonikolas
%A Vasilis Kontonis
%A Christos Tzamos
%A Nikos Zarifis
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-diakonikolas25e
%I PMLR
%P 13679--13692
%U https://proceedings.mlr.press/v267/diakonikolas25e.html
%V 267
%X We study the task of online learning in the presence of Massart noise. Specifically, instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\boldsymbol{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\boldsymbol{x}$ with unknown probability at most $\eta$. We focus on the fundamental class of $\gamma$-margin linear classifiers and present the first computationally efficient algorithm that achieves mistake bound $\eta T + o(T)$. We point out that the mistake bound achieved by our algorithm is qualitatively tight for computationally efficient algorithms; this follows from the fact that, even in the offline setting, achieving 0-1 error better than $\eta$ requires super-polynomial time under standard complexity assumptions. We extend our online learning model to a $k$-arm contextual bandit setting where the rewards—instead of satisfying commonly used realizability assumptions—are consistent, in expectation, with some linear ranking function with weight vector $\boldsymbol{w}^\ast$. Given a list of contexts $\boldsymbol{x}_1,\ldots \boldsymbol{x}_k$, if $\boldsymbol{w}^*\cdot \boldsymbol{x}_i > \boldsymbol{w}^* \cdot \boldsymbol{x}_j$, the expected reward of action $i$ must be larger than that of $j$ by at least $\Delta$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) \Delta T - o(T)$ bigger than choosing a random action at every round.

APA

Diakonikolas, I., Kontonis, V., Tzamos, C. & Zarifis, N.. (2025). Online Linear Classification with Massart Noise. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:13679-13692 Available from https://proceedings.mlr.press/v267/diakonikolas25e.html.

Online Linear Classification with Massart Noise

Abstract

Cite this Paper

Related Material