Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources

Yi-Xuan Sun, Ya-Lin Zhang, Bin Han, Longfei Li, Jun Zhou
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:47261-47279, 2024.

Abstract

The strong performance of neural networks typically hinges on the availability of extensive labeled data, yet acquiring ground-truth labels is often challenging. Instead, noisy supervisions from multiple sources, e.g., by multiple well-designed rules, are more convenient to collect. In this paper, we focus on the realistic problem of learning from multiple noisy label sources, and argue that prior studies have overlooked the crucial self-cognition ability of neural networks, i.e., the inherent capability of autonomously distinguishing noise during training. We theoretically analyze this ability of neural networks when meeting multiple noisy label sources, which reveals that neural networks possess the capability to recognize both instance-wise noise within each single noisy label source and annotator-wise quality among multiple noisy label sources. Inspired by the theoretical analyses, we introduce an approach named Self-cognitive Denoising for Multiple noisy label sources (SDM), which exploits the self-cognition ability of neural networks to denoise during training. Furthermore, we build a selective distillation module following the theoretical insights to optimize computational efficiency. The experiments on various datasets demonstrate the superiority of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-sun24o, title = {Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources}, author = {Sun, Yi-Xuan and Zhang, Ya-Lin and Han, Bin and Li, Longfei and Zhou, Jun}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {47261--47279}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/sun24o/sun24o.pdf}, url = {https://proceedings.mlr.press/v235/sun24o.html}, abstract = {The strong performance of neural networks typically hinges on the availability of extensive labeled data, yet acquiring ground-truth labels is often challenging. Instead, noisy supervisions from multiple sources, e.g., by multiple well-designed rules, are more convenient to collect. In this paper, we focus on the realistic problem of learning from multiple noisy label sources, and argue that prior studies have overlooked the crucial self-cognition ability of neural networks, i.e., the inherent capability of autonomously distinguishing noise during training. We theoretically analyze this ability of neural networks when meeting multiple noisy label sources, which reveals that neural networks possess the capability to recognize both instance-wise noise within each single noisy label source and annotator-wise quality among multiple noisy label sources. Inspired by the theoretical analyses, we introduce an approach named Self-cognitive Denoising for Multiple noisy label sources (SDM), which exploits the self-cognition ability of neural networks to denoise during training. Furthermore, we build a selective distillation module following the theoretical insights to optimize computational efficiency. The experiments on various datasets demonstrate the superiority of our method.} }
Endnote
%0 Conference Paper %T Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources %A Yi-Xuan Sun %A Ya-Lin Zhang %A Bin Han %A Longfei Li %A Jun Zhou %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-sun24o %I PMLR %P 47261--47279 %U https://proceedings.mlr.press/v235/sun24o.html %V 235 %X The strong performance of neural networks typically hinges on the availability of extensive labeled data, yet acquiring ground-truth labels is often challenging. Instead, noisy supervisions from multiple sources, e.g., by multiple well-designed rules, are more convenient to collect. In this paper, we focus on the realistic problem of learning from multiple noisy label sources, and argue that prior studies have overlooked the crucial self-cognition ability of neural networks, i.e., the inherent capability of autonomously distinguishing noise during training. We theoretically analyze this ability of neural networks when meeting multiple noisy label sources, which reveals that neural networks possess the capability to recognize both instance-wise noise within each single noisy label source and annotator-wise quality among multiple noisy label sources. Inspired by the theoretical analyses, we introduce an approach named Self-cognitive Denoising for Multiple noisy label sources (SDM), which exploits the self-cognition ability of neural networks to denoise during training. Furthermore, we build a selective distillation module following the theoretical insights to optimize computational efficiency. The experiments on various datasets demonstrate the superiority of our method.
APA
Sun, Y., Zhang, Y., Han, B., Li, L. & Zhou, J.. (2024). Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:47261-47279 Available from https://proceedings.mlr.press/v235/sun24o.html.

Related Material