KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

Mohammad Amanlou; Erfan Shafiee Moghaddam; Mahdi Nouri; Yasaman Amou Jafary; Farhan Farsi; Behnam Bahrak

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

Mohammad Amanlou, Erfan Shafiee Moghaddam, Mahdi Nouri, Yasaman Amou Jafary, Farhan Farsi, Behnam Bahrak

Conference on Parsimony and Learning, PMLR 328:989-1024, 2026.

Abstract

With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the time and cost of building specialized assessment datasets. We introduce KNIGHT, an LLM-based, knowledge-graph-driven framework for generating multiple-choice question (MCQ) datasets from external sources. KNIGHT constructs a topic-specific knowledge graph, a structured, parsimonious summary of entities and relations, that can be reused to generate instructor-controlled difficulty levels, including multi-hop questions, without repeatedly re-feeding the full source text. This KG acts as a compressed, reusable state, making question generation a cheap read over the graph. We instantiate KNIGHT on Wikipedia/Wikidata, while keeping the framework domain- and ontology-agnostic. As a case study, KNIGHT produces six MCQ datasets in History, Biology, and Mathematics. We evaluate quality on five criteria: fluency, unambiguity (single correct answer), topic relevance, option uniqueness, and answerability given the provided sources (as a proxy for hallucination). Results show that KNIGHT enables token- and cost-efficient generation from a reusable KG representation, achieves high quality across these criteria, and yields model rankings aligned with MMLU-style benchmarks, while supporting topic-specific and difficulty-controlled evaluation.

Cite this Paper

BibTeX

@InProceedings{pmlr-v328-amanlou26a,
  title = 	 {KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration},
  author =       {Amanlou, Mohammad and Moghaddam, Erfan Shafiee and Nouri, Mahdi and Jafary, Yasaman Amou and Farsi, Farhan and Bahrak, Behnam},
  booktitle = 	 {Conference on Parsimony and Learning},
  pages = 	 {989--1024},
  year = 	 {2026},
  editor = 	 {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui},
  volume = 	 {328},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--26 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v328/main/assets/amanlou26a/amanlou26a.pdf},
  url = 	 {https://proceedings.mlr.press/v328/amanlou26a.html},
  abstract = 	 {With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the time and cost of building specialized assessment datasets. We introduce KNIGHT, an LLM-based, knowledge-graph-driven framework for generating multiple-choice question (MCQ) datasets from external sources. KNIGHT constructs a topic-specific knowledge graph, a structured, parsimonious summary of entities and relations, that can be reused to generate instructor-controlled difficulty levels, including multi-hop questions, without repeatedly re-feeding the full source text. This KG acts as a compressed, reusable state, making question generation a cheap read over the graph. We instantiate KNIGHT on Wikipedia/Wikidata, while keeping the framework domain- and ontology-agnostic. As a case study, KNIGHT produces six MCQ datasets in History, Biology, and Mathematics. We evaluate quality on five criteria: fluency, unambiguity (single correct answer), topic relevance, option uniqueness, and answerability given the provided sources (as a proxy for hallucination). Results show that KNIGHT enables token- and cost-efficient generation from a reusable KG representation, achieves high quality across these criteria, and yields model rankings aligned with MMLU-style benchmarks, while supporting topic-specific and difficulty-controlled evaluation.}
}

Endnote

%0 Conference Paper
%T KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration
%A Mohammad Amanlou
%A Erfan Shafiee Moghaddam
%A Mahdi Nouri
%A Yasaman Amou Jafary
%A Farhan Farsi
%A Behnam Bahrak
%B Conference on Parsimony and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Rebekka Burkholz
%E Shiwei Liu
%E Saiprasad Ravishankar
%E William Redman
%E Wei Huang
%E Weijie Su
%E Zhihui Zhu	
%F pmlr-v328-amanlou26a
%I PMLR
%P 989--1024
%U https://proceedings.mlr.press/v328/amanlou26a.html
%V 328
%X With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the time and cost of building specialized assessment datasets. We introduce KNIGHT, an LLM-based, knowledge-graph-driven framework for generating multiple-choice question (MCQ) datasets from external sources. KNIGHT constructs a topic-specific knowledge graph, a structured, parsimonious summary of entities and relations, that can be reused to generate instructor-controlled difficulty levels, including multi-hop questions, without repeatedly re-feeding the full source text. This KG acts as a compressed, reusable state, making question generation a cheap read over the graph. We instantiate KNIGHT on Wikipedia/Wikidata, while keeping the framework domain- and ontology-agnostic. As a case study, KNIGHT produces six MCQ datasets in History, Biology, and Mathematics. We evaluate quality on five criteria: fluency, unambiguity (single correct answer), topic relevance, option uniqueness, and answerability given the provided sources (as a proxy for hallucination). Results show that KNIGHT enables token- and cost-efficient generation from a reusable KG representation, achieves high quality across these criteria, and yields model rankings aligned with MMLU-style benchmarks, while supporting topic-specific and difficulty-controlled evaluation.

APA

Amanlou, M., Moghaddam, E.S., Nouri, M., Jafary, Y.A., Farsi, F. & Bahrak, B.. (2026). KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:989-1024 Available from https://proceedings.mlr.press/v328/amanlou26a.html.

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

Abstract

Cite this Paper

Related Material