Fares on Fairness: Using a Total Error Framework to Examine the Role of Measurement and Representation in Training Data on Model Fairness and Bias

Patrick Oliver Schenk, Christoph Kern, Trent D. Buskirk
Proceedings of Fourth European Workshop on Algorithmic Fairness, PMLR 294:187-211, 2025.

Abstract

Data-driven decisions, often based on predictions from machine learning (ML) models are becoming ubiquitous. For these decisions to be just, the underlying ML models must be fair, i.e., work equally well for all parts of the population such as groups defined by gender or age. What are the logical next steps if, however, a trained model is accurate but not fair? How can we guide the whole data pipeline such that we avoid training unfair models based on inadequate data, recognizing possible sources of unfairness early on? How can the concepts of data-based sources of unfairness that exist in the fair ML literature be organized, perhaps in a way to gain new insight? In this paper, we explore two total error frameworks from the social sciences, Total Survey Error and its generalization Total Data Quality, to help elucidate issues related to fairness and trace its antecedents. The goal of this thought piece is to acquaint the fair ML community with these two frameworks, discussing errors of measurement and errors of representation through their organized structure. We illustrate how they may be useful, both practically and conceptually.

Cite this Paper


BibTeX
@InProceedings{pmlr-v294-schenk25a, title = {Fares on Fairness: Using a Total Error Framework to Examine the Role of Measurement and Representation in Training Data on Model Fairness and Bias}, author = {Schenk, Patrick Oliver and Kern, Christoph and Buskirk, Trent D.}, booktitle = {Proceedings of Fourth European Workshop on Algorithmic Fairness}, pages = {187--211}, year = {2025}, editor = {Weerts, Hilde and Pechenizkiy, Mykola and Allhutter, Doris and CorrĂȘa, Ana Maria and Grote, Thomas and Liem, Cynthia}, volume = {294}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--02 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v294/main/assets/schenk25a/schenk25a.pdf}, url = {https://proceedings.mlr.press/v294/schenk25a.html}, abstract = {Data-driven decisions, often based on predictions from machine learning (ML) models are becoming ubiquitous. For these decisions to be just, the underlying ML models must be fair, i.e., work equally well for all parts of the population such as groups defined by gender or age. What are the logical next steps if, however, a trained model is accurate but not fair? How can we guide the whole data pipeline such that we avoid training unfair models based on inadequate data, recognizing possible sources of unfairness early on? How can the concepts of data-based sources of unfairness that exist in the fair ML literature be organized, perhaps in a way to gain new insight? In this paper, we explore two total error frameworks from the social sciences, Total Survey Error and its generalization Total Data Quality, to help elucidate issues related to fairness and trace its antecedents. The goal of this thought piece is to acquaint the fair ML community with these two frameworks, discussing errors of measurement and errors of representation through their organized structure. We illustrate how they may be useful, both practically and conceptually.} }
Endnote
%0 Conference Paper %T Fares on Fairness: Using a Total Error Framework to Examine the Role of Measurement and Representation in Training Data on Model Fairness and Bias %A Patrick Oliver Schenk %A Christoph Kern %A Trent D. Buskirk %B Proceedings of Fourth European Workshop on Algorithmic Fairness %C Proceedings of Machine Learning Research %D 2025 %E Hilde Weerts %E Mykola Pechenizkiy %E Doris Allhutter %E Ana Maria CorrĂȘa %E Thomas Grote %E Cynthia Liem %F pmlr-v294-schenk25a %I PMLR %P 187--211 %U https://proceedings.mlr.press/v294/schenk25a.html %V 294 %X Data-driven decisions, often based on predictions from machine learning (ML) models are becoming ubiquitous. For these decisions to be just, the underlying ML models must be fair, i.e., work equally well for all parts of the population such as groups defined by gender or age. What are the logical next steps if, however, a trained model is accurate but not fair? How can we guide the whole data pipeline such that we avoid training unfair models based on inadequate data, recognizing possible sources of unfairness early on? How can the concepts of data-based sources of unfairness that exist in the fair ML literature be organized, perhaps in a way to gain new insight? In this paper, we explore two total error frameworks from the social sciences, Total Survey Error and its generalization Total Data Quality, to help elucidate issues related to fairness and trace its antecedents. The goal of this thought piece is to acquaint the fair ML community with these two frameworks, discussing errors of measurement and errors of representation through their organized structure. We illustrate how they may be useful, both practically and conceptually.
APA
Schenk, P.O., Kern, C. & Buskirk, T.D.. (2025). Fares on Fairness: Using a Total Error Framework to Examine the Role of Measurement and Representation in Training Data on Model Fairness and Bias. Proceedings of Fourth European Workshop on Algorithmic Fairness, in Proceedings of Machine Learning Research 294:187-211 Available from https://proceedings.mlr.press/v294/schenk25a.html.

Related Material