[edit]
Facial Demography Analysis of the LAION Dataset
Proceedings of Fourth European Workshop on Algorithmic Fairness, PMLR 294:357-361, 2025.
Abstract
Large-scale image-text datasets have become fundamental building blocks for modern AI systems, raising concerns about the demographic biases they may encode and propagate. We present a comprehensive analysis of LAION, one of the largest and most influential datasets in this domain, focusing on demographic representation and intersectional biases across age, gender and race. Our methodology combines state-of-the-art face detection (RetinaFace) with specialized demographic classifiers (FairFace and EMO-AffectNet) to analyze a random sample of 500,000 image URLs from ReLAION-2B-en, yielding over 37,000 faces. We analyze both general representational biases, revealing severe overrepresentation of certain groups-such as white people and individuals aged 20-29-and intersectional biases, notably the underrepresentation of women over 30 years old and non-White infants. These results highlight the importance of considering not just individual demographic attributes, but their intersections when evaluating and mitigating bias in large-scale datasets.