QActor: Active Learning on Noisy Labels

Taraneh Younesian, Zilong Zhao, Amirmasoud Ghiassi, Robert Birke, Lydia Y Chen
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:548-563, 2021.

Abstract

Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don’t have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning. In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-younesian21a, title = {QActor: Active Learning on Noisy Labels}, author = {Younesian, Taraneh and Zhao, Zilong and Ghiassi, Amirmasoud and Birke, Robert and Chen, Lydia Y}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {548--563}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/younesian21a/younesian21a.pdf}, url = {https://proceedings.mlr.press/v157/younesian21a.html}, abstract = {Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don’t have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning. In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.} }
Endnote
%0 Conference Paper %T QActor: Active Learning on Noisy Labels %A Taraneh Younesian %A Zilong Zhao %A Amirmasoud Ghiassi %A Robert Birke %A Lydia Y Chen %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-younesian21a %I PMLR %P 548--563 %U https://proceedings.mlr.press/v157/younesian21a.html %V 157 %X Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don’t have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning. In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.
APA
Younesian, T., Zhao, Z., Ghiassi, A., Birke, R. & Chen, L.Y.. (2021). QActor: Active Learning on Noisy Labels. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:548-563 Available from https://proceedings.mlr.press/v157/younesian21a.html.

Related Material