Scratching the Surface: Reflections of Training Data Properties in Early CNN Filters

Grayson Jorgenson, Cassie Heine, Robin Cosbey, Abby Reynolds, Davis Brown, Henry Kvinge, Timothy Doster, Tegan Emerson
Proceedings of the 1st Conference on Topology, Algebra, and Geometry in Data Science(TAG-DS 2025), PMLR 321:166-175, 2026.

Abstract

The ability to understand deep learning models by analyzing their weights is key to advancing the growing field of model interpretability. In this article, we study information about the training data of convolutional neural network (CNN) models that can be gleaned from analyzing just the first of their learned filters. While gradient updates to the model weights during training become increasingly complex in the deeper layers of typical CNNs, the updates to the initial layer can be simple enough that high-level dataset properties such as image sharpness, noisiness, and color distribution are prominently featured. We give a simple mathematical justification for this and demonstrate how training dataset properties appear in this way for several standard CNNs on a number of datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v321-jorgenson26a, title = {Scratching the Surface: Reflections of Training Data Properties in Early CNN Filters}, author = {Jorgenson, Grayson and Heine, Cassie and Cosbey, Robin and Reynolds, Abby and Brown, Davis and Kvinge, Henry and Doster, Timothy and Emerson, Tegan}, booktitle = {Proceedings of the 1st Conference on Topology, Algebra, and Geometry in Data Science(TAG-DS 2025)}, pages = {166--175}, year = {2026}, editor = {Bernardez Gil, Guillermo and Black, Mitchell and Cloninger, Alexander and Doster, Timothy and Emerson, Tegan and Garcı́a-Rodondo, Ińes and Holtz, Chester and Kotak, Mit and Kvinge, Henry and Mishne, Gal and Papillon, Mathilde and Pouplin, Alison and Rainey, Katie and Rieck, Bastian and Telyatnikov, Lev and Yeats, Eric and Wang, Qingsong and Wang, Yusu and Wayland, Jeremy}, volume = {321}, series = {Proceedings of Machine Learning Research}, month = {01--02 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v321/main/assets/jorgenson26a/jorgenson26a.pdf}, url = {https://proceedings.mlr.press/v321/jorgenson26a.html}, abstract = {The ability to understand deep learning models by analyzing their weights is key to advancing the growing field of model interpretability. In this article, we study information about the training data of convolutional neural network (CNN) models that can be gleaned from analyzing just the first of their learned filters. While gradient updates to the model weights during training become increasingly complex in the deeper layers of typical CNNs, the updates to the initial layer can be simple enough that high-level dataset properties such as image sharpness, noisiness, and color distribution are prominently featured. We give a simple mathematical justification for this and demonstrate how training dataset properties appear in this way for several standard CNNs on a number of datasets.} }
Endnote
%0 Conference Paper %T Scratching the Surface: Reflections of Training Data Properties in Early CNN Filters %A Grayson Jorgenson %A Cassie Heine %A Robin Cosbey %A Abby Reynolds %A Davis Brown %A Henry Kvinge %A Timothy Doster %A Tegan Emerson %B Proceedings of the 1st Conference on Topology, Algebra, and Geometry in Data Science(TAG-DS 2025) %C Proceedings of Machine Learning Research %D 2026 %E Guillermo Bernardez Gil %E Mitchell Black %E Alexander Cloninger %E Timothy Doster %E Tegan Emerson %E Ińes Garcı́a-Rodondo %E Chester Holtz %E Mit Kotak %E Henry Kvinge %E Gal Mishne %E Mathilde Papillon %E Alison Pouplin %E Katie Rainey %E Bastian Rieck %E Lev Telyatnikov %E Eric Yeats %E Qingsong Wang %E Yusu Wang %E Jeremy Wayland %F pmlr-v321-jorgenson26a %I PMLR %P 166--175 %U https://proceedings.mlr.press/v321/jorgenson26a.html %V 321 %X The ability to understand deep learning models by analyzing their weights is key to advancing the growing field of model interpretability. In this article, we study information about the training data of convolutional neural network (CNN) models that can be gleaned from analyzing just the first of their learned filters. While gradient updates to the model weights during training become increasingly complex in the deeper layers of typical CNNs, the updates to the initial layer can be simple enough that high-level dataset properties such as image sharpness, noisiness, and color distribution are prominently featured. We give a simple mathematical justification for this and demonstrate how training dataset properties appear in this way for several standard CNNs on a number of datasets.
APA
Jorgenson, G., Heine, C., Cosbey, R., Reynolds, A., Brown, D., Kvinge, H., Doster, T. & Emerson, T.. (2026). Scratching the Surface: Reflections of Training Data Properties in Early CNN Filters. Proceedings of the 1st Conference on Topology, Algebra, and Geometry in Data Science(TAG-DS 2025), in Proceedings of Machine Learning Research 321:166-175 Available from https://proceedings.mlr.press/v321/jorgenson26a.html.

Related Material