[edit]
How is the Socio-Demographic Background of Researchers in AI & ML Related to the Values reflected in their Research?
Proceedings of Fourth European Workshop on Algorithmic Fairness, PMLR 294:481-486, 2025.
Abstract
In this work we investigate the socio-demographic factors influencing the production of influential Artificial Intelligence (AI) and Machine Learning (ML) research. This work builds upon prior work, which identified a predominance of power-centralizing values and an underrepresentation of user rights and ethical principles in AI & ML publications, this study analyzes whether the socio-demographic composition of authors influences the prevalence of these values. An initial dataset (seed publications) was analyzed with the most cited publications presented at top-tier conferences NeurIPS and ICML in four selected years: 2008, 2009, 2018, and 2019. Then, an enriched dataset with all publications in the same conferences and years is constructed from open-access research platforms such as Semantic Scholar and Open Alex. Publications are identified as closely related to one of two groups derived from initial annotations in the seed publications: (i) moral group and (ii) non-moral group. This is achieved by computing jaccard similarity reference overlap between paper publications and constructing a similarity-based network, followed by backbone extraction and ego network extraction. Diversity scores for research collaborations are calculated enabling a statistical analysis with the two groups of publications. Results from human validation reveal that despite the developed method successfully constructs a similarity-based measure, it does not reliably infer shared moral values. Publications closely tied to a publication categorized as moral do not necessarily share the same values, despite having a high overlap based on shared references. Additional results show that the diversity characteristics of research collaborations in both groups do not have a statistically significant relationship with the moral classification. While there is some diversity, the general observations, however, show a significant underrepresentation of women and a concentration of researchers from a few nationalities, elite institutions, and technology companies, predominantly from the global north.