How is the Socio-Demographic Background of Researchers in AI & ML Related to the Values reflected in their Research?

Paula Nauta, Fariba Karimi, Ana María Jaramillo
Proceedings of Fourth European Workshop on Algorithmic Fairness, PMLR 294:481-486, 2025.

Abstract

In this work we investigate the socio-demographic factors influencing the production of influential Artificial Intelligence (AI) and Machine Learning (ML) research. This work builds upon prior work, which identified a predominance of power-centralizing values and an underrepresentation of user rights and ethical principles in AI & ML publications, this study analyzes whether the socio-demographic composition of authors influences the prevalence of these values. An initial dataset (seed publications) was analyzed with the most cited publications presented at top-tier conferences NeurIPS and ICML in four selected years: 2008, 2009, 2018, and 2019. Then, an enriched dataset with all publications in the same conferences and years is constructed from open-access research platforms such as Semantic Scholar and Open Alex. Publications are identified as closely related to one of two groups derived from initial annotations in the seed publications: (i) moral group and (ii) non-moral group. This is achieved by computing jaccard similarity reference overlap between paper publications and constructing a similarity-based network, followed by backbone extraction and ego network extraction. Diversity scores for research collaborations are calculated enabling a statistical analysis with the two groups of publications. Results from human validation reveal that despite the developed method successfully constructs a similarity-based measure, it does not reliably infer shared moral values. Publications closely tied to a publication categorized as moral do not necessarily share the same values, despite having a high overlap based on shared references. Additional results show that the diversity characteristics of research collaborations in both groups do not have a statistically significant relationship with the moral classification. While there is some diversity, the general observations, however, show a significant underrepresentation of women and a concentration of researchers from a few nationalities, elite institutions, and technology companies, predominantly from the global north.

Cite this Paper


BibTeX
@InProceedings{pmlr-v294-nauta25a, title = {How is the Socio-Demographic Background of Researchers in AI & ML Related to the Values reflected in their Research?}, author = {Nauta, Paula and Karimi, Fariba and Jaramillo, Ana Mar\'ia}, booktitle = {Proceedings of Fourth European Workshop on Algorithmic Fairness}, pages = {481--486}, year = {2025}, editor = {Weerts, Hilde and Pechenizkiy, Mykola and Allhutter, Doris and Corrêa, Ana Maria and Grote, Thomas and Liem, Cynthia}, volume = {294}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--02 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v294/main/assets/nauta25a/nauta25a.pdf}, url = {https://proceedings.mlr.press/v294/nauta25a.html}, abstract = {In this work we investigate the socio-demographic factors influencing the production of influential Artificial Intelligence (AI) and Machine Learning (ML) research. This work builds upon prior work, which identified a predominance of power-centralizing values and an underrepresentation of user rights and ethical principles in AI & ML publications, this study analyzes whether the socio-demographic composition of authors influences the prevalence of these values. An initial dataset (seed publications) was analyzed with the most cited publications presented at top-tier conferences NeurIPS and ICML in four selected years: 2008, 2009, 2018, and 2019. Then, an enriched dataset with all publications in the same conferences and years is constructed from open-access research platforms such as Semantic Scholar and Open Alex. Publications are identified as closely related to one of two groups derived from initial annotations in the seed publications: (i) moral group and (ii) non-moral group. This is achieved by computing jaccard similarity reference overlap between paper publications and constructing a similarity-based network, followed by backbone extraction and ego network extraction. Diversity scores for research collaborations are calculated enabling a statistical analysis with the two groups of publications. Results from human validation reveal that despite the developed method successfully constructs a similarity-based measure, it does not reliably infer shared moral values. Publications closely tied to a publication categorized as moral do not necessarily share the same values, despite having a high overlap based on shared references. Additional results show that the diversity characteristics of research collaborations in both groups do not have a statistically significant relationship with the moral classification. While there is some diversity, the general observations, however, show a significant underrepresentation of women and a concentration of researchers from a few nationalities, elite institutions, and technology companies, predominantly from the global north.} }
Endnote
%0 Conference Paper %T How is the Socio-Demographic Background of Researchers in AI & ML Related to the Values reflected in their Research? %A Paula Nauta %A Fariba Karimi %A Ana María Jaramillo %B Proceedings of Fourth European Workshop on Algorithmic Fairness %C Proceedings of Machine Learning Research %D 2025 %E Hilde Weerts %E Mykola Pechenizkiy %E Doris Allhutter %E Ana Maria Corrêa %E Thomas Grote %E Cynthia Liem %F pmlr-v294-nauta25a %I PMLR %P 481--486 %U https://proceedings.mlr.press/v294/nauta25a.html %V 294 %X In this work we investigate the socio-demographic factors influencing the production of influential Artificial Intelligence (AI) and Machine Learning (ML) research. This work builds upon prior work, which identified a predominance of power-centralizing values and an underrepresentation of user rights and ethical principles in AI & ML publications, this study analyzes whether the socio-demographic composition of authors influences the prevalence of these values. An initial dataset (seed publications) was analyzed with the most cited publications presented at top-tier conferences NeurIPS and ICML in four selected years: 2008, 2009, 2018, and 2019. Then, an enriched dataset with all publications in the same conferences and years is constructed from open-access research platforms such as Semantic Scholar and Open Alex. Publications are identified as closely related to one of two groups derived from initial annotations in the seed publications: (i) moral group and (ii) non-moral group. This is achieved by computing jaccard similarity reference overlap between paper publications and constructing a similarity-based network, followed by backbone extraction and ego network extraction. Diversity scores for research collaborations are calculated enabling a statistical analysis with the two groups of publications. Results from human validation reveal that despite the developed method successfully constructs a similarity-based measure, it does not reliably infer shared moral values. Publications closely tied to a publication categorized as moral do not necessarily share the same values, despite having a high overlap based on shared references. Additional results show that the diversity characteristics of research collaborations in both groups do not have a statistically significant relationship with the moral classification. While there is some diversity, the general observations, however, show a significant underrepresentation of women and a concentration of researchers from a few nationalities, elite institutions, and technology companies, predominantly from the global north.
APA
Nauta, P., Karimi, F. & Jaramillo, A.M.. (2025). How is the Socio-Demographic Background of Researchers in AI & ML Related to the Values reflected in their Research?. Proceedings of Fourth European Workshop on Algorithmic Fairness, in Proceedings of Machine Learning Research 294:481-486 Available from https://proceedings.mlr.press/v294/nauta25a.html.

Related Material