[edit]
Play the (Mis)Match: Using fMRI-Aligned Feature Fine-Tuning to Reveal Shortcut Bias in Deep Neural Networks
Proceedings of the First Workshop on NeuroAI Multimodal Intelligence @ AAAI 2026, PMLR 308:99-107, 2026.
Abstract
Deep neural networks (DNNs) often “cheat” by relying on shortcut objects (e.g., food$\Rightarrow$kitchen) rather than holistic spatial layout, undermining out-of-distribution (OOD) robustness. This work serves as a proof-of-concept exploration of whether fMRI alignment can reduce shortcut bias in visual DNNs. We address this issue with Play the (Mis)Match, a diagnostic dataset and brain-aligned fine-tuning framework. Leveraging fMRI recordings from the Natural Scenes Dataset (four participants; bedroom, bathroom, living room, kitchen), we curate MATCH images in which shortcut cues co-occur as usual and MISMATCH images from which those cues are removed. ImageNet-initialised CNN and Transformer backbones are fine-tuned with an MSE alignment loss that steers their intermediate features toward voxel patterns known to be less sensitive to shortcut cues. Our results show that, for ResNet, this procedure narrows the Match–Mismatch accuracy gap by 24 % and redirects Grad-CAM attention from individual objects to holistic scene structure, particularly activity from the scene-selective cortex (PPA, RSC, OPA), all without explicit shortcut annotations. Our study provides a proof-of-concept that human-brain constraints may help steer DNNs toward more semantically grounded, less shortcut-dependent scene representations.