Active Testing: An Efficient and Robust Framework for Estimating Accuracy

Phuc Nguyen, Deva Ramanan, Charless Fowlkes
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3759-3768, 2018.

Abstract

Much recent work on large-scale visual recogni- tion aims to scale up learning to massive, noisily- annotated datasets. We address the problem of scaling-up the evaluation of such models to large- scale datasets with noisy labels. Current protocols for doing so require a human user to either vet (re-annotate) a small fraction of the testset and ignore the rest, or else correct errors in annotation as they are found through manual inspection of results. In this work, we re-formulate the problem as one of active testing, and examine strategies for efficiently querying a user so as to obtain an accurate performance estimate with minimal vet- ting. We demonstrate the effectiveness of our proposed active testing framework on estimating two performance metrics, Precision@K and mean Average Precisions, for two popular Computer Vi- sion tasks, multilabel classification and instance segmentation, respectively. We further show that our approach is able to siginificantly save human annotation effort and more robust than alterna- tive evaluation protocols.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-nguyen18d, title = {Active Testing: An Efficient and Robust Framework for Estimating Accuracy}, author = {Nguyen, Phuc and Ramanan, Deva and Fowlkes, Charless}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {3759--3768}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/nguyen18d/nguyen18d.pdf}, url = {https://proceedings.mlr.press/v80/nguyen18d.html}, abstract = {Much recent work on large-scale visual recogni- tion aims to scale up learning to massive, noisily- annotated datasets. We address the problem of scaling-up the evaluation of such models to large- scale datasets with noisy labels. Current protocols for doing so require a human user to either vet (re-annotate) a small fraction of the testset and ignore the rest, or else correct errors in annotation as they are found through manual inspection of results. In this work, we re-formulate the problem as one of active testing, and examine strategies for efficiently querying a user so as to obtain an accurate performance estimate with minimal vet- ting. We demonstrate the effectiveness of our proposed active testing framework on estimating two performance metrics, Precision@K and mean Average Precisions, for two popular Computer Vi- sion tasks, multilabel classification and instance segmentation, respectively. We further show that our approach is able to siginificantly save human annotation effort and more robust than alterna- tive evaluation protocols.} }
Endnote
%0 Conference Paper %T Active Testing: An Efficient and Robust Framework for Estimating Accuracy %A Phuc Nguyen %A Deva Ramanan %A Charless Fowlkes %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-nguyen18d %I PMLR %P 3759--3768 %U https://proceedings.mlr.press/v80/nguyen18d.html %V 80 %X Much recent work on large-scale visual recogni- tion aims to scale up learning to massive, noisily- annotated datasets. We address the problem of scaling-up the evaluation of such models to large- scale datasets with noisy labels. Current protocols for doing so require a human user to either vet (re-annotate) a small fraction of the testset and ignore the rest, or else correct errors in annotation as they are found through manual inspection of results. In this work, we re-formulate the problem as one of active testing, and examine strategies for efficiently querying a user so as to obtain an accurate performance estimate with minimal vet- ting. We demonstrate the effectiveness of our proposed active testing framework on estimating two performance metrics, Precision@K and mean Average Precisions, for two popular Computer Vi- sion tasks, multilabel classification and instance segmentation, respectively. We further show that our approach is able to siginificantly save human annotation effort and more robust than alterna- tive evaluation protocols.
APA
Nguyen, P., Ramanan, D. & Fowlkes, C.. (2018). Active Testing: An Efficient and Robust Framework for Estimating Accuracy. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3759-3768 Available from https://proceedings.mlr.press/v80/nguyen18d.html.

Related Material