Fairness-Aware Machine Learning for Social Bias Detection in Healthcare Research Datasets

Precious Kolawole

Fairness-Aware Machine Learning for Social Bias Detection in Healthcare Research Datasets

Precious Kolawole

DLI 2025 Research Track, PMLR 302:1-10, 2026.

Abstract

This work presents an automated tool for detecting and measuring bias in healthcare datasets and predictive models. We evaluated fairness at both the data and algorithmic levels using metrics including Statistical Parity Difference (SPD), Equal Opportunity Difference (EOD), and Demographic Disparity. Using the SyntheticMass (healthcare expenses) and Brain Stroke healthcare datasets, we found that SyntheticMass showed substantial demographic imbalance (83.6% White patients) and age-based disparities (SPD: 0.82 for younger vs. elderly patients). While the Brain Stroke dataset exhibited more balanced demographics, we identified substantial disparities in stroke outcomes between age groups. Across both datasets, neural networks consistently outperformed traditional machine learning models on fairness metrics. In the Brain Stroke dataset, neural networks achieved both higher accuracy (94.8% vs. 91.8% for the best traditional model) and nearly perfect fairness scores (SPD: 0.000–0.0007; EOD: 0.000–0.0128). Additionally, we introduced a combined scoring metric that equally weights accuracy and fairness, providing researchers with a practical framework for model selection that prioritizes both dimensions. The interactive visualization dashboard makes fairness analysis accessible to medical researchers without specialized knowledge of fairness-aware machine learning. Keywords: healthcare bias, fairness evaluation, machine learning, neural networks, statistical parity, equal opportunity, demographic disparity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v302-kolawole26a,
  title = 	 {Fairness-Aware Machine Learning for Social Bias Detection in Healthcare Research Datasets},
  author =       {Kolawole, Precious},
  booktitle = 	 {DLI 2025 Research Track},
  pages = 	 {1--10},
  year = 	 {2026},
  editor = 	 {Haddad, Hatem and Kahira, Albert Njoroge and Bourhim, Sofia and Olatunji, Iyiola Emmanuel and Makhafola, Lesego and Mwase, Christine},
  volume = 	 {302},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--22 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v302/main/assets/kolawole26a/kolawole26a.pdf},
  url = 	 {https://proceedings.mlr.press/v302/kolawole26a.html},
  abstract = 	 {This work presents an automated tool for detecting and measuring bias in healthcare datasets and predictive models. We evaluated fairness at both the data and algorithmic levels using metrics including Statistical Parity Difference (SPD), Equal Opportunity Difference (EOD), and Demographic Disparity. Using the SyntheticMass (healthcare expenses) and Brain Stroke healthcare datasets, we found that SyntheticMass showed substantial demographic imbalance (83.6% White patients) and age-based disparities (SPD: 0.82 for younger vs. elderly patients). While the Brain Stroke dataset exhibited more balanced demographics, we identified substantial disparities in stroke outcomes between age groups. Across both datasets, neural networks consistently outperformed traditional machine learning models on fairness metrics. In the Brain Stroke dataset, neural networks achieved both higher accuracy (94.8% vs. 91.8% for the best traditional model) and nearly perfect fairness scores (SPD: 0.000–0.0007; EOD: 0.000–0.0128). Additionally, we introduced a combined scoring metric that equally weights accuracy and fairness, providing researchers with a practical framework for model selection that prioritizes both dimensions. The interactive visualization dashboard makes fairness analysis accessible to medical researchers without specialized knowledge of fairness-aware machine learning.  Keywords: healthcare bias, fairness evaluation, machine learning, neural networks, statistical parity, equal opportunity, demographic disparity.}
}

Endnote

%0 Conference Paper
%T Fairness-Aware Machine Learning for Social Bias Detection in Healthcare Research Datasets
%A Precious Kolawole
%B DLI 2025 Research Track
%C Proceedings of Machine Learning Research
%D 2026
%E Hatem Haddad
%E Albert Njoroge Kahira
%E Sofia Bourhim
%E Iyiola Emmanuel Olatunji
%E Lesego Makhafola
%E Christine Mwase	
%F pmlr-v302-kolawole26a
%I PMLR
%P 1--10
%U https://proceedings.mlr.press/v302/kolawole26a.html
%V 302
%X This work presents an automated tool for detecting and measuring bias in healthcare datasets and predictive models. We evaluated fairness at both the data and algorithmic levels using metrics including Statistical Parity Difference (SPD), Equal Opportunity Difference (EOD), and Demographic Disparity. Using the SyntheticMass (healthcare expenses) and Brain Stroke healthcare datasets, we found that SyntheticMass showed substantial demographic imbalance (83.6% White patients) and age-based disparities (SPD: 0.82 for younger vs. elderly patients). While the Brain Stroke dataset exhibited more balanced demographics, we identified substantial disparities in stroke outcomes between age groups. Across both datasets, neural networks consistently outperformed traditional machine learning models on fairness metrics. In the Brain Stroke dataset, neural networks achieved both higher accuracy (94.8% vs. 91.8% for the best traditional model) and nearly perfect fairness scores (SPD: 0.000–0.0007; EOD: 0.000–0.0128). Additionally, we introduced a combined scoring metric that equally weights accuracy and fairness, providing researchers with a practical framework for model selection that prioritizes both dimensions. The interactive visualization dashboard makes fairness analysis accessible to medical researchers without specialized knowledge of fairness-aware machine learning.  Keywords: healthcare bias, fairness evaluation, machine learning, neural networks, statistical parity, equal opportunity, demographic disparity.

APA

Kolawole, P.. (2026). Fairness-Aware Machine Learning for Social Bias Detection in Healthcare Research Datasets. DLI 2025 Research Track, in Proceedings of Machine Learning Research 302:1-10 Available from https://proceedings.mlr.press/v302/kolawole26a.html.

Related Material

Download PDF