Detecting Phishing Emails in Nigerian Pidgin English Using a Dialect-Aware and Behavioural NLP Model

Zubaida Muhtar Alhassan

Detecting Phishing Emails in Nigerian Pidgin English Using a Dialect-Aware and Behavioural NLP Model

Zubaida Muhtar Alhassan

Proceedings of IndabaX Nigeria 2026: Building Scalable AI That Works: From Research to Deployment in Resource-Constrained Environments, PMLR 319:144-154, 2026.

Abstract

This study proposes a dialect-aware and behaviourally informed NLP model for detecting phishing emails in Nigerian Pidgin, spoken by over 100 million people in Nigeria. A balanced dataset of 870 emails was created using a hybrid translation and generation process, validated by native speakers. The model combines TF-IDF-based linguistic features with seven behavioural indicators derived from persuasion theory, optimised via a Genetic Algorithm-tuned Random Forest classifier. The system achieved 93.89% accuracy, 100.00% precision, and 87.69% recall, demonstrating the importance of integrating behavioural and linguistic analysis for cybersecurity in low-resource language contexts.

Cite this Paper

BibTeX

@InProceedings{pmlr-v319-alhassan26a,
  title = 	 {Detecting Phishing Emails in {Nigerian} {Pidgin} {English} Using a Dialect-Aware and Behavioural {NLP} Model},
  author =       {Alhassan, Zubaida Muhtar},
  booktitle = 	 {Proceedings of IndabaX Nigeria 2026: Building Scalable AI That Works: From Research to Deployment in Resource-Constrained Environments},
  pages = 	 {144--154},
  year = 	 {2026},
  editor = 	 {Folorunso, Sakinat and Ogundokun, Roseline and Oladipo, Francisca},
  volume = 	 {319},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--14 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v319/main/assets/alhassan26a/alhassan26a.pdf},
  url = 	 {https://proceedings.mlr.press/v319/alhassan26a.html},
  abstract = 	 {This study proposes a dialect-aware and behaviourally informed NLP model for detecting phishing emails in Nigerian Pidgin, spoken by over 100 million people in Nigeria. A balanced dataset of 870 emails was created using a hybrid translation and generation process, validated by native speakers. The model combines TF-IDF-based linguistic features with seven behavioural indicators derived from persuasion theory, optimised via a Genetic Algorithm-tuned Random Forest classifier. The system achieved 93.89% accuracy, 100.00% precision, and 87.69% recall, demonstrating the importance of integrating behavioural and linguistic analysis for cybersecurity in low-resource language contexts.}
}

Endnote

%0 Conference Paper
%T Detecting Phishing Emails in Nigerian Pidgin English Using a Dialect-Aware and Behavioural NLP Model
%A Zubaida Muhtar Alhassan
%B Proceedings of IndabaX Nigeria 2026: Building Scalable AI That Works: From Research to Deployment in Resource-Constrained Environments
%C Proceedings of Machine Learning Research
%D 2026
%E Sakinat Folorunso
%E Roseline Ogundokun
%E Francisca Oladipo	
%F pmlr-v319-alhassan26a
%I PMLR
%P 144--154
%U https://proceedings.mlr.press/v319/alhassan26a.html
%V 319
%X This study proposes a dialect-aware and behaviourally informed NLP model for detecting phishing emails in Nigerian Pidgin, spoken by over 100 million people in Nigeria. A balanced dataset of 870 emails was created using a hybrid translation and generation process, validated by native speakers. The model combines TF-IDF-based linguistic features with seven behavioural indicators derived from persuasion theory, optimised via a Genetic Algorithm-tuned Random Forest classifier. The system achieved 93.89% accuracy, 100.00% precision, and 87.69% recall, demonstrating the importance of integrating behavioural and linguistic analysis for cybersecurity in low-resource language contexts.

APA

Alhassan, Z.M.. (2026). Detecting Phishing Emails in Nigerian Pidgin English Using a Dialect-Aware and Behavioural NLP Model. Proceedings of IndabaX Nigeria 2026: Building Scalable AI That Works: From Research to Deployment in Resource-Constrained Environments, in Proceedings of Machine Learning Research 319:144-154 Available from https://proceedings.mlr.press/v319/alhassan26a.html.

Related Material

Download PDF