Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise

Henry Reeve, Ata Kaban
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5401-5409, 2019.

Abstract

We consider classification in the presence of class-dependent asymmetric label noise with unknown noise probabilities. In this setting, identifiability conditions are known, but additional assumptions were shown to be required for finite sample rates, and so far only the parametric rate has been obtained. Assuming these identifiability conditions, together with a measure-smoothness condition on the regression function and Tsybakov’s margin condition, we show that the Robust kNN classifier of Gao et al. attains, the mini-max optimal rates of the noise-free setting, up to a log factor, even when trained on data with unknown asymmetric label noise. Hence, our results provide a solid theoretical backing for this empirically successful algorithm. By contrast the standard kNN is not even consistent in the setting of asymmetric label noise. A key idea in our analysis is a simple kNN based method for estimating the maximum of a function that requires far less assumptions than existing mode estimators do, and which may be of independent interest for noise proportion estimation and randomised optimisation problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-reeve19a, title = {Fast Rates for a k{NN} Classifier Robust to Unknown Asymmetric Label Noise}, author = {Reeve, Henry and Kaban, Ata}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5401--5409}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/reeve19a/reeve19a.pdf}, url = {https://proceedings.mlr.press/v97/reeve19a.html}, abstract = {We consider classification in the presence of class-dependent asymmetric label noise with unknown noise probabilities. In this setting, identifiability conditions are known, but additional assumptions were shown to be required for finite sample rates, and so far only the parametric rate has been obtained. Assuming these identifiability conditions, together with a measure-smoothness condition on the regression function and Tsybakov’s margin condition, we show that the Robust kNN classifier of Gao et al. attains, the mini-max optimal rates of the noise-free setting, up to a log factor, even when trained on data with unknown asymmetric label noise. Hence, our results provide a solid theoretical backing for this empirically successful algorithm. By contrast the standard kNN is not even consistent in the setting of asymmetric label noise. A key idea in our analysis is a simple kNN based method for estimating the maximum of a function that requires far less assumptions than existing mode estimators do, and which may be of independent interest for noise proportion estimation and randomised optimisation problems.} }
Endnote
%0 Conference Paper %T Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise %A Henry Reeve %A Ata Kaban %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-reeve19a %I PMLR %P 5401--5409 %U https://proceedings.mlr.press/v97/reeve19a.html %V 97 %X We consider classification in the presence of class-dependent asymmetric label noise with unknown noise probabilities. In this setting, identifiability conditions are known, but additional assumptions were shown to be required for finite sample rates, and so far only the parametric rate has been obtained. Assuming these identifiability conditions, together with a measure-smoothness condition on the regression function and Tsybakov’s margin condition, we show that the Robust kNN classifier of Gao et al. attains, the mini-max optimal rates of the noise-free setting, up to a log factor, even when trained on data with unknown asymmetric label noise. Hence, our results provide a solid theoretical backing for this empirically successful algorithm. By contrast the standard kNN is not even consistent in the setting of asymmetric label noise. A key idea in our analysis is a simple kNN based method for estimating the maximum of a function that requires far less assumptions than existing mode estimators do, and which may be of independent interest for noise proportion estimation and randomised optimisation problems.
APA
Reeve, H. & Kaban, A.. (2019). Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5401-5409 Available from https://proceedings.mlr.press/v97/reeve19a.html.

Related Material