Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance

Jie Shen, Chicheng Zhang
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:1072-1113, 2021.

Abstract

This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $\mathbb{R}^d$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how $s$-sparse halfspaces can be efficiently learned under the challenging {\em malicious noise} model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of $\tilde{O}(s \log^4\frac{d}{\epsilon})$ and noise tolerance $\eta = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that the sample complexity is $\tilde{O}(\frac{1}{\epsilon}s^2 \log^5 d)$ which also enjoys attribute efficiency. Our main techniques include attribute-efficient paradigms for soft outlier removal and for empirical risk minimization, and a new analysis of uniform concentration for unbounded instances – all of them crucially take the sparsity structure of the underlying halfspace into account.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-shen21a, title = {Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance}, author = {Shen, Jie and Zhang, Chicheng}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {1072--1113}, year = {2021}, editor = {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/shen21a/shen21a.pdf}, url = {https://proceedings.mlr.press/v132/shen21a.html}, abstract = {This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $\mathbb{R}^d$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how $s$-sparse halfspaces can be efficiently learned under the challenging {\em malicious noise} model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of $\tilde{O}(s \log^4\frac{d}{\epsilon})$ and noise tolerance $\eta = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that the sample complexity is $\tilde{O}(\frac{1}{\epsilon}s^2 \log^5 d)$ which also enjoys attribute efficiency. Our main techniques include attribute-efficient paradigms for soft outlier removal and for empirical risk minimization, and a new analysis of uniform concentration for unbounded instances – all of them crucially take the sparsity structure of the underlying halfspace into account.} }
Endnote
%0 Conference Paper %T Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance %A Jie Shen %A Chicheng Zhang %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-shen21a %I PMLR %P 1072--1113 %U https://proceedings.mlr.press/v132/shen21a.html %V 132 %X This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $\mathbb{R}^d$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how $s$-sparse halfspaces can be efficiently learned under the challenging {\em malicious noise} model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of $\tilde{O}(s \log^4\frac{d}{\epsilon})$ and noise tolerance $\eta = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that the sample complexity is $\tilde{O}(\frac{1}{\epsilon}s^2 \log^5 d)$ which also enjoys attribute efficiency. Our main techniques include attribute-efficient paradigms for soft outlier removal and for empirical risk minimization, and a new analysis of uniform concentration for unbounded instances – all of them crucially take the sparsity structure of the underlying halfspace into account.
APA
Shen, J. & Zhang, C.. (2021). Attribute-Efficient Learning of Halfspaces with Malicious Noise: Near-Optimal Label Complexity and Noise Tolerance. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:1072-1113 Available from https://proceedings.mlr.press/v132/shen21a.html.

Related Material