Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks

Teresa Dorszewski; Lenka Tětková; Lorenz Linhardt; Lars Kai Hansen

Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks

Teresa Dorszewski, Lenka Tětková, Lorenz Linhardt, Lars Kai Hansen

Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), PMLR 265:41-50, 2025.

Abstract

Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. Motivated by theories of human cognition, this study examines the relationship between convexity in neural network representations and human-machine alignment based on behavioral data. We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models. Our findings suggest the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks. While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two. This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.

Cite this Paper

BibTeX

@InProceedings{pmlr-v265-dorszewski25a,
  title = 	 {Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks},
  author =       {Dorszewski, Teresa and T{\v{e}}tkov{\'a}, Lenka and Linhardt, Lorenz and Hansen, Lars Kai},
  booktitle = 	 {Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL)},
  pages = 	 {41--50},
  year = 	 {2025},
  editor = 	 {Lutchyn, Tetiana and Ramírez Rivera, Adín and Ricaud, Benjamin},
  volume = 	 {265},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--09 Jan},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v265/main/assets/dorszewski25a/dorszewski25a.pdf},
  url = 	 {https://proceedings.mlr.press/v265/dorszewski25a.html},
  abstract = 	 {Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. Motivated by theories of human cognition, this study examines the relationship between convexity in neural network representations and human-machine alignment based on behavioral data. We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models. Our findings suggest the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks. While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two. This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.}
}

Endnote

%0 Conference Paper
%T Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks
%A Teresa Dorszewski
%A Lenka Tětková
%A Lorenz Linhardt
%A Lars Kai Hansen
%B Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL)
%C Proceedings of Machine Learning Research
%D 2025
%E Tetiana Lutchyn
%E Adín Ramírez Rivera
%E Benjamin Ricaud	
%F pmlr-v265-dorszewski25a
%I PMLR
%P 41--50
%U https://proceedings.mlr.press/v265/dorszewski25a.html
%V 265
%X Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. Motivated by theories of human cognition, this study examines the relationship between convexity in neural network representations and human-machine alignment based on behavioral data. We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models. Our findings suggest the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks. While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two. This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.

APA

Dorszewski, T., Tětková, L., Linhardt, L. & Hansen, L.K.. (2025). Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks. Proceedings of the 6th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 265:41-50 Available from https://proceedings.mlr.press/v265/dorszewski25a.html.

Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks

Abstract

Cite this Paper

Related Material