Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics

Jarren Briscoe, Garrett Kepler, Daryl Robert DeFord, Assefaw Gebremedhin
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4915-4923, 2025.

Abstract

Evaluating machine learning models is crucial not only for determining their technical accuracy but also for assessing their potential societal implications. While the potential for low-sample-size bias in algorithms is well known, we demonstrate the significance of sample-size bias induced by combinatorics in classification metrics. This revelation challenges the efficacy of these metrics in assessing bias with high resolution, especially when comparing groups of disparate sizes, which frequently arise in social applications. We provide analyses of the bias that appears in several commonly applied metrics and propose a model-agnostic assessment and correction technique. Additionally, we analyze counts of undefined cases in metric calculations, which can lead to misleading evaluations if improperly handled. This work illuminates the previously unrecognized challenge of combinatorics and probability in standard evaluation practices and thereby advances approaches for performing fair and trustworthy classification methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-briscoe25a, title = {Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics}, author = {Briscoe, Jarren and Kepler, Garrett and DeFord, Daryl Robert and Gebremedhin, Assefaw}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {4915--4923}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/briscoe25a/briscoe25a.pdf}, url = {https://proceedings.mlr.press/v258/briscoe25a.html}, abstract = {Evaluating machine learning models is crucial not only for determining their technical accuracy but also for assessing their potential societal implications. While the potential for low-sample-size bias in algorithms is well known, we demonstrate the significance of sample-size bias induced by combinatorics in classification metrics. This revelation challenges the efficacy of these metrics in assessing bias with high resolution, especially when comparing groups of disparate sizes, which frequently arise in social applications. We provide analyses of the bias that appears in several commonly applied metrics and propose a model-agnostic assessment and correction technique. Additionally, we analyze counts of undefined cases in metric calculations, which can lead to misleading evaluations if improperly handled. This work illuminates the previously unrecognized challenge of combinatorics and probability in standard evaluation practices and thereby advances approaches for performing fair and trustworthy classification methods.} }
Endnote
%0 Conference Paper %T Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics %A Jarren Briscoe %A Garrett Kepler %A Daryl Robert DeFord %A Assefaw Gebremedhin %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-briscoe25a %I PMLR %P 4915--4923 %U https://proceedings.mlr.press/v258/briscoe25a.html %V 258 %X Evaluating machine learning models is crucial not only for determining their technical accuracy but also for assessing their potential societal implications. While the potential for low-sample-size bias in algorithms is well known, we demonstrate the significance of sample-size bias induced by combinatorics in classification metrics. This revelation challenges the efficacy of these metrics in assessing bias with high resolution, especially when comparing groups of disparate sizes, which frequently arise in social applications. We provide analyses of the bias that appears in several commonly applied metrics and propose a model-agnostic assessment and correction technique. Additionally, we analyze counts of undefined cases in metric calculations, which can lead to misleading evaluations if improperly handled. This work illuminates the previously unrecognized challenge of combinatorics and probability in standard evaluation practices and thereby advances approaches for performing fair and trustworthy classification methods.
APA
Briscoe, J., Kepler, G., DeFord, D.R. & Gebremedhin, A.. (2025). Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4915-4923 Available from https://proceedings.mlr.press/v258/briscoe25a.html.

Related Material