Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance

Jie Shen; Chicheng Zhang

Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance

Jie Shen, Chicheng Zhang

Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:1072-1113, 2021.

Abstract

This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $\mathbb{R}^d$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how $s$-sparse halfspaces can be efficiently learned under the challenging {\em malicious noise} model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of $\tilde{O}(s \log^4\frac{d}{\epsilon})$ and noise tolerance $\eta = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that the sample complexity is $\tilde{O}(\frac{1}{\epsilon}s^2 \log^5 d)$ which also enjoys attribute efficiency. Our main techniques include attribute-efficient paradigms for soft outlier removal and for empirical risk minimization, and a new analysis of uniform concentration for unbounded instances – all of them crucially take the sparsity structure of the underlying halfspace into account.

Cite this Paper

BibTeX


@InProceedings{pmlr-v132-shen21a,
  title = 	 {Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance},
  author =       {Shen, Jie and Zhang, Chicheng},
  booktitle = 	 {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  pages = 	 {1072--1113},
  year = 	 {2021},
  editor = 	 {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan},
  volume = 	 {132},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--19 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v132/shen21a/shen21a.pdf},
  url = 	 {https://proceedings.mlr.press/v132/shen21a.html},
  abstract = 	 {This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $\mathbb{R}^d$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how $s$-sparse halfspaces can be efficiently learned under the challenging {\em malicious noise} model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of $\tilde{O}(s \log^4\frac{d}{\epsilon})$ and noise tolerance $\eta = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that the sample complexity is $\tilde{O}(\frac{1}{\epsilon}s^2 \log^5 d)$ which also enjoys attribute efficiency. Our main techniques include attribute-efficient paradigms for soft outlier removal and for empirical risk minimization, and a new analysis of uniform concentration for unbounded instances – all of them crucially take the sparsity structure of the underlying halfspace into account.}
}

Endnote

%0 Conference Paper
%T Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance
%A Jie Shen
%A Chicheng Zhang
%B Proceedings of the 32nd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Vitaly Feldman
%E Katrina Ligett
%E Sivan Sabato	
%F pmlr-v132-shen21a
%I PMLR
%P 1072--1113
%U https://proceedings.mlr.press/v132/shen21a.html
%V 132
%X This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $\mathbb{R}^d$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how $s$-sparse halfspaces can be efficiently learned under the challenging {\em malicious noise} model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of $\tilde{O}(s \log^4\frac{d}{\epsilon})$ and noise tolerance $\eta = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that the sample complexity is $\tilde{O}(\frac{1}{\epsilon}s^2 \log^5 d)$ which also enjoys attribute efficiency. Our main techniques include attribute-efficient paradigms for soft outlier removal and for empirical risk minimization, and a new analysis of uniform concentration for unbounded instances – all of them crucially take the sparsity structure of the underlying halfspace into account.

APA


Shen, J. & Zhang, C.. (2021). Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:1072-1113 Available from https://proceedings.mlr.press/v132/shen21a.html.

Related Material

Download PDF