Decision-margin consistency: a principled metric for human and machine performance alignment

George A. Alvarez, Talia Konkle
Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:24-39, 2024.

Abstract

Understanding the alignment between human and machine perceptual decision-making is a fundamental challenge. While most current vision deep neural networks are deterministic and produce consistent outputs for the same input, human perceptual decisions are notoriously noisy. This noise can originate from perceptual encoding, decision processes, or even attentional fluctuations, leading to different responses for the same stimulus across trials. Thus, any meaningful comparison between human-to-human or human-to-machine decisions must take this internal noise into account to avoid underestimating alignment. In this paper, we introduce the \textbf{decision-margin consistency metric}, which draws on signal detection theory, by incorporating both the variability in decision difficulty across items and the noise in human responses. By focusing on decision-margin distances-continuous measures of signal strength underlying binary outcomes-our method can be applied to both model and human systems to capture the nuanced agreement in item-level difficulty. Applying this metric to existing visual categorization datasets reveals a dramatic increase in human-human agreement relative to the standard error consistency metric. Further, human-to-machine agreement showed only a modest increase, highlighting an even larger representational gap between these systems on these challenging perceptual decisions. Broadly, this work underscores the importance of accounting for internal noise when comparing human and machine error patterns, and offers a new principled metric for measuring representational alignment for biological and artificial systems

Cite this Paper


BibTeX
@InProceedings{pmlr-v285-alvarez24a, title = {Decision-margin consistency: a principled metric for human and machine performance alignment}, author = {Alvarez, George A. and Konkle, Talia}, booktitle = {Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models}, pages = {24--39}, year = {2024}, editor = {Fumero, Marco and Domine, Clementine and Lähner, Zorah and Crisostomi, Donato and Moschella, Luca and Stachenfeld, Kimberly}, volume = {285}, series = {Proceedings of Machine Learning Research}, month = {14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v285/main/assets/alvarez24a/alvarez24a.pdf}, url = {https://proceedings.mlr.press/v285/alvarez24a.html}, abstract = {Understanding the alignment between human and machine perceptual decision-making is a fundamental challenge. While most current vision deep neural networks are deterministic and produce consistent outputs for the same input, human perceptual decisions are notoriously noisy. This noise can originate from perceptual encoding, decision processes, or even attentional fluctuations, leading to different responses for the same stimulus across trials. Thus, any meaningful comparison between human-to-human or human-to-machine decisions must take this internal noise into account to avoid underestimating alignment. In this paper, we introduce the \textbf{decision-margin consistency metric}, which draws on signal detection theory, by incorporating both the variability in decision difficulty across items and the noise in human responses. By focusing on decision-margin distances-continuous measures of signal strength underlying binary outcomes-our method can be applied to both model and human systems to capture the nuanced agreement in item-level difficulty. Applying this metric to existing visual categorization datasets reveals a dramatic increase in human-human agreement relative to the standard error consistency metric. Further, human-to-machine agreement showed only a modest increase, highlighting an even larger representational gap between these systems on these challenging perceptual decisions. Broadly, this work underscores the importance of accounting for internal noise when comparing human and machine error patterns, and offers a new principled metric for measuring representational alignment for biological and artificial systems} }
Endnote
%0 Conference Paper %T Decision-margin consistency: a principled metric for human and machine performance alignment %A George A. Alvarez %A Talia Konkle %B Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Clementine Domine %E Zorah Lähner %E Donato Crisostomi %E Luca Moschella %E Kimberly Stachenfeld %F pmlr-v285-alvarez24a %I PMLR %P 24--39 %U https://proceedings.mlr.press/v285/alvarez24a.html %V 285 %X Understanding the alignment between human and machine perceptual decision-making is a fundamental challenge. While most current vision deep neural networks are deterministic and produce consistent outputs for the same input, human perceptual decisions are notoriously noisy. This noise can originate from perceptual encoding, decision processes, or even attentional fluctuations, leading to different responses for the same stimulus across trials. Thus, any meaningful comparison between human-to-human or human-to-machine decisions must take this internal noise into account to avoid underestimating alignment. In this paper, we introduce the \textbf{decision-margin consistency metric}, which draws on signal detection theory, by incorporating both the variability in decision difficulty across items and the noise in human responses. By focusing on decision-margin distances-continuous measures of signal strength underlying binary outcomes-our method can be applied to both model and human systems to capture the nuanced agreement in item-level difficulty. Applying this metric to existing visual categorization datasets reveals a dramatic increase in human-human agreement relative to the standard error consistency metric. Further, human-to-machine agreement showed only a modest increase, highlighting an even larger representational gap between these systems on these challenging perceptual decisions. Broadly, this work underscores the importance of accounting for internal noise when comparing human and machine error patterns, and offers a new principled metric for measuring representational alignment for biological and artificial systems
APA
Alvarez, G.A. & Konkle, T.. (2024). Decision-margin consistency: a principled metric for human and machine performance alignment. Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 285:24-39 Available from https://proceedings.mlr.press/v285/alvarez24a.html.

Related Material