[edit]
Detecting Phishing Emails in Nigerian Pidgin English Using a Dialect-Aware and Behavioural NLP Model
Proceedings of IndabaX Nigeria 2026: Building Scalable AI That Works: From Research to Deployment in Resource-Constrained Environments, PMLR 319:144-154, 2026.
Abstract
This study proposes a dialect-aware and behaviourally informed NLP model for detecting phishing emails in Nigerian Pidgin, spoken by over 100 million people in Nigeria. A balanced dataset of 870 emails was created using a hybrid translation and generation process, validated by native speakers. The model combines TF-IDF-based linguistic features with seven behavioural indicators derived from persuasion theory, optimised via a Genetic Algorithm-tuned Random Forest classifier. The system achieved 93.89% accuracy, 100.00% precision, and 87.69% recall, demonstrating the importance of integrating behavioural and linguistic analysis for cybersecurity in low-resource language contexts.