[edit]
Evaluating Fairness in AI-Assisted Remote Proctoring
Proceedings of the Innovation and Responsibility in AI-Supported Education Workshop, PMLR 273:125-132, 2025.
Abstract
Remote proctors make decisions about whether test takers have violated testing rules and, as a result, whether to certify test takers’ scores. These decisions rely on both AI signals and human evaluation of test-taking behaviors. Given that fairness is a key component of test validity evidence, it is critical that proctors’ decisions are unbiased with respect to proctor and test-taker background characteristics (e.g., gender, age, and nationality). In this study, we empirically evaluate whether proctor or test-taker background characteristics affect whether a test taker is flagged for rule violations. Results suggest that proctor and test-taker nationality may influence proctoring decisions, whereas gender and age do not. The direction of the influence generally reflects an “in-group, out-group” bias: proctors are less likely to identify rule violations among test takers with similar nationalities as proctors (in-group favoring) and more likely to identify rule violations among test takers of different nationalities (out-group disfavoring). Results also suggest that decisions based on AI signals may be less prone to in-group/out-group bias than decisions based on human evaluation only, although more research is needed to support this finding.