From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, Aleksander Madry
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9625-9635, 2020.

Abstract

Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset—including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for. Finally, our findings emphasize the need to augment our current model training and evaluation toolkit to take such misalignment into account.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-tsipras20a, title = {From {I}mage{N}et to Image Classification: Contextualizing Progress on Benchmarks}, author = {Tsipras, Dimitris and Santurkar, Shibani and Engstrom, Logan and Ilyas, Andrew and Madry, Aleksander}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9625--9635}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/tsipras20a/tsipras20a.pdf}, url = {https://proceedings.mlr.press/v119/tsipras20a.html}, abstract = {Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset—including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for. Finally, our findings emphasize the need to augment our current model training and evaluation toolkit to take such misalignment into account.} }
Endnote
%0 Conference Paper %T From ImageNet to Image Classification: Contextualizing Progress on Benchmarks %A Dimitris Tsipras %A Shibani Santurkar %A Logan Engstrom %A Andrew Ilyas %A Aleksander Madry %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-tsipras20a %I PMLR %P 9625--9635 %U https://proceedings.mlr.press/v119/tsipras20a.html %V 119 %X Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset—including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for. Finally, our findings emphasize the need to augment our current model training and evaluation toolkit to take such misalignment into account.
APA
Tsipras, D., Santurkar, S., Engstrom, L., Ilyas, A. & Madry, A.. (2020). From ImageNet to Image Classification: Contextualizing Progress on Benchmarks. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9625-9635 Available from https://proceedings.mlr.press/v119/tsipras20a.html.

Related Material