Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers

Piotr Teterwak, Chiyuan Zhang, Dilip Krishnan, Michael C Mozer
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:10225-10235, 2021.

Abstract

A discriminatively trained neural net classifier can fit the training data perfectly if all information about its input other than class membership has been discarded prior to the output layer. Surprisingly, past research has discovered that some extraneous visual detail remains in the unnormalized logits. This finding is based on inversion techniques that map deep embeddings back to images. We explore this phenomenon further using a novel synthesis of methods, yielding a feedforward inversion model that produces remarkably high fidelity reconstructions, qualitatively superior to those of past efforts. When applied to an adversarially robust classifier model, the reconstructions contain sufficient local detail and global structure that they might be confused with the original image in a quick glance, and the object category can clearly be gleaned from the reconstruction. Our approach is based on BigGAN (Brock, 2019), with conditioning on logits instead of one-hot class labels. We use our reconstruction model as a tool for exploring the nature of representations, including: the influence of model architecture and training objectives (specifically robust losses), the forms of invariance that networks achieve, representational differences between correctly and incorrectly classified images, and the effects of manipulating logits and images. We believe that our method can inspire future investigations into the nature of information flow in a neural net and can provide diagnostics for improving discriminative models. We provide pre-trained models and visualizations at \url{https://sites.google.com/view/understanding-invariance/home}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-teterwak21a, title = {Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers}, author = {Teterwak, Piotr and Zhang, Chiyuan and Krishnan, Dilip and Mozer, Michael C}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {10225--10235}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/teterwak21a/teterwak21a.pdf}, url = {https://proceedings.mlr.press/v139/teterwak21a.html}, abstract = {A discriminatively trained neural net classifier can fit the training data perfectly if all information about its input other than class membership has been discarded prior to the output layer. Surprisingly, past research has discovered that some extraneous visual detail remains in the unnormalized logits. This finding is based on inversion techniques that map deep embeddings back to images. We explore this phenomenon further using a novel synthesis of methods, yielding a feedforward inversion model that produces remarkably high fidelity reconstructions, qualitatively superior to those of past efforts. When applied to an adversarially robust classifier model, the reconstructions contain sufficient local detail and global structure that they might be confused with the original image in a quick glance, and the object category can clearly be gleaned from the reconstruction. Our approach is based on BigGAN (Brock, 2019), with conditioning on logits instead of one-hot class labels. We use our reconstruction model as a tool for exploring the nature of representations, including: the influence of model architecture and training objectives (specifically robust losses), the forms of invariance that networks achieve, representational differences between correctly and incorrectly classified images, and the effects of manipulating logits and images. We believe that our method can inspire future investigations into the nature of information flow in a neural net and can provide diagnostics for improving discriminative models. We provide pre-trained models and visualizations at \url{https://sites.google.com/view/understanding-invariance/home}.} }
Endnote
%0 Conference Paper %T Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers %A Piotr Teterwak %A Chiyuan Zhang %A Dilip Krishnan %A Michael C Mozer %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-teterwak21a %I PMLR %P 10225--10235 %U https://proceedings.mlr.press/v139/teterwak21a.html %V 139 %X A discriminatively trained neural net classifier can fit the training data perfectly if all information about its input other than class membership has been discarded prior to the output layer. Surprisingly, past research has discovered that some extraneous visual detail remains in the unnormalized logits. This finding is based on inversion techniques that map deep embeddings back to images. We explore this phenomenon further using a novel synthesis of methods, yielding a feedforward inversion model that produces remarkably high fidelity reconstructions, qualitatively superior to those of past efforts. When applied to an adversarially robust classifier model, the reconstructions contain sufficient local detail and global structure that they might be confused with the original image in a quick glance, and the object category can clearly be gleaned from the reconstruction. Our approach is based on BigGAN (Brock, 2019), with conditioning on logits instead of one-hot class labels. We use our reconstruction model as a tool for exploring the nature of representations, including: the influence of model architecture and training objectives (specifically robust losses), the forms of invariance that networks achieve, representational differences between correctly and incorrectly classified images, and the effects of manipulating logits and images. We believe that our method can inspire future investigations into the nature of information flow in a neural net and can provide diagnostics for improving discriminative models. We provide pre-trained models and visualizations at \url{https://sites.google.com/view/understanding-invariance/home}.
APA
Teterwak, P., Zhang, C., Krishnan, D. & Mozer, M.C.. (2021). Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:10225-10235 Available from https://proceedings.mlr.press/v139/teterwak21a.html.

Related Material