[edit]
ToxiSight: Leveraging Moderator Expertise Through Behavioral Measurement in Gaming Toxicity Annotation
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:650-661, 2026.
Abstract
Content moderation systems commonly treat human annotators as interchangeable label sources, resolving disagreements through majority voting or expert arbitration. We present ToxiSight, an annotation platform that reframes this assumption: rather than extracting consensus, the system supports moderator reasoning by treating hesitation, revision, and disagreement as signals revealing where content is genuinely ambiguous and where taxonomic guidelines fail. ToxiSight integrates gaming-specific contextual widgets with behavioral telemetry, capturing the cognitive processes underlying toxicity validation decisions. Through deployment with 10 professional moderators across 60,000 lines of gaming chat, we demonstrate that behavioral patterns expose systematic category failures invisible to traditional inter-annotator metrics. The Controversial category shows 72% revision rates with fast processing times, indicating immediate recognition of definitional breakdown, while Threats (Life-Threatening) exhibits 75% revisions with slow processing, signaling genuine interpretive complexity. Completion rates improved from 60% to 95%, and moderators reported reduced decision stress when permitted to express uncertainty. This case study demonstrates that trustworthy toxicity detection requires annotation systems designed around the irreducible complexity of human judgment, not against it.