Do ImageNet Classifiers Generalize to ImageNet?

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5389-5400, 2019.

Abstract

We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models’ inability to generalize to slightly "harder" images than those found in the original test sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-recht19a, title = {Do {I}mage{N}et Classifiers Generalize to {I}mage{N}et?}, author = {Recht, Benjamin and Roelofs, Rebecca and Schmidt, Ludwig and Shankar, Vaishaal}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5389--5400}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/recht19a/recht19a.pdf}, url = {https://proceedings.mlr.press/v97/recht19a.html}, abstract = {We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models’ inability to generalize to slightly "harder" images than those found in the original test sets.} }
Endnote
%0 Conference Paper %T Do ImageNet Classifiers Generalize to ImageNet? %A Benjamin Recht %A Rebecca Roelofs %A Ludwig Schmidt %A Vaishaal Shankar %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-recht19a %I PMLR %P 5389--5400 %U https://proceedings.mlr.press/v97/recht19a.html %V 97 %X We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models’ inability to generalize to slightly "harder" images than those found in the original test sets.
APA
Recht, B., Roelofs, R., Schmidt, L. & Shankar, V.. (2019). Do ImageNet Classifiers Generalize to ImageNet?. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5389-5400 Available from https://proceedings.mlr.press/v97/recht19a.html.

Related Material