Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks

Teresa Dorszewski, Lenka Tětková, Lorenz Linhardt, Lars Kai Hansen
Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), PMLR 265:41-50, 2025.

Abstract

Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. Motivated by theories of human cognition, this study examines the relationship between convexity in neural network representations and human-machine alignment based on behavioral data. We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models. Our findings suggest the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks. While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two. This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v265-dorszewski25a, title = {Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks}, author = {Dorszewski, Teresa and T{\v{e}}tkov{\'a}, Lenka and Linhardt, Lorenz and Hansen, Lars Kai}, booktitle = {Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL)}, pages = {41--50}, year = {2025}, editor = {Lutchyn, Tetiana and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {265}, series = {Proceedings of Machine Learning Research}, month = {07--09 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v265/main/assets/dorszewski25a/dorszewski25a.pdf}, url = {https://proceedings.mlr.press/v265/dorszewski25a.html}, abstract = {Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. Motivated by theories of human cognition, this study examines the relationship between convexity in neural network representations and human-machine alignment based on behavioral data. We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models. Our findings suggest the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks. While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two. This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.} }
Endnote
%0 Conference Paper %T Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks %A Teresa Dorszewski %A Lenka Tětková %A Lorenz Linhardt %A Lars Kai Hansen %B Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2025 %E Tetiana Lutchyn %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v265-dorszewski25a %I PMLR %P 41--50 %U https://proceedings.mlr.press/v265/dorszewski25a.html %V 265 %X Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. Motivated by theories of human cognition, this study examines the relationship between convexity in neural network representations and human-machine alignment based on behavioral data. We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models. Our findings suggest the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks. While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two. This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.
APA
Dorszewski, T., Tětková, L., Linhardt, L. & Hansen, L.K.. (2025). Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks. Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 265:41-50 Available from https://proceedings.mlr.press/v265/dorszewski25a.html.

Related Material