Automated Assessment of Students’ Code Comprehension using LLM

Priti Oli, Rabin Banjade, Jeevan Chapagain, Vasile Rus
Proceedings of the 2024 AAAI Conference on Artificial Intelligence, PMLR 257:118-128, 2024.

Abstract

Assessing students’ answers and in particular natural language answers is a crucial challenge in the field of education. Advances in transformer-based models such as Large Language Models(LLMs), have led to significant progress in various natural language tasks. Nevertheless, amidst the growing trend of evaluating LLMs across diverse tasks, evaluating LLMs in the realm of automated answer assessment has not received much attention. To address this gap, we explore the potential of using LLMs for automated assessment of student’s short and open-ended answers in program comprehension tasks. Particularly, we use LLMs to compare students’ explanations with expert explanations in the context of line-by-line explanations of computer programs. For comparison purposes, we assess both decoder-only Large Language Models (LLMs) and encoder-based Semantic Textual Similarity (STS) models in the context of assessing the correctness of students’ explanation of computer code. Our findings indicate that decoder-only LLMs, when prompted in few-shot and chain-of-thought setting perform comparable to fine-tuned encoder-based models in evaluating students’ short answers in the programming domain.

Cite this Paper


BibTeX
@InProceedings{pmlr-v257-oli24a, title = {Automated Assessment of Students’ Code Comprehension using LLM}, author = {Oli, Priti and Banjade, Rabin and Chapagain, Jeevan and Rus, Vasile}, booktitle = {Proceedings of the 2024 AAAI Conference on Artificial Intelligence}, pages = {118--128}, year = {2024}, editor = {Ananda, Muktha and Malick, Debshila Basu and Burstein, Jill and Liu, Lydia T. and Liu, Zitao and Sharpnack, James and Wang, Zichao and Wang, Serena}, volume = {257}, series = {Proceedings of Machine Learning Research}, month = {26--27 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v257/main/assets/oli24a/oli24a.pdf}, url = {https://proceedings.mlr.press/v257/oli24a.html}, abstract = {Assessing students’ answers and in particular natural language answers is a crucial challenge in the field of education. Advances in transformer-based models such as Large Language Models(LLMs), have led to significant progress in various natural language tasks. Nevertheless, amidst the growing trend of evaluating LLMs across diverse tasks, evaluating LLMs in the realm of automated answer assessment has not received much attention. To address this gap, we explore the potential of using LLMs for automated assessment of student’s short and open-ended answers in program comprehension tasks. Particularly, we use LLMs to compare students’ explanations with expert explanations in the context of line-by-line explanations of computer programs. For comparison purposes, we assess both decoder-only Large Language Models (LLMs) and encoder-based Semantic Textual Similarity (STS) models in the context of assessing the correctness of students’ explanation of computer code. Our findings indicate that decoder-only LLMs, when prompted in few-shot and chain-of-thought setting perform comparable to fine-tuned encoder-based models in evaluating students’ short answers in the programming domain.} }
Endnote
%0 Conference Paper %T Automated Assessment of Students’ Code Comprehension using LLM %A Priti Oli %A Rabin Banjade %A Jeevan Chapagain %A Vasile Rus %B Proceedings of the 2024 AAAI Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Muktha Ananda %E Debshila Basu Malick %E Jill Burstein %E Lydia T. Liu %E Zitao Liu %E James Sharpnack %E Zichao Wang %E Serena Wang %F pmlr-v257-oli24a %I PMLR %P 118--128 %U https://proceedings.mlr.press/v257/oli24a.html %V 257 %X Assessing students’ answers and in particular natural language answers is a crucial challenge in the field of education. Advances in transformer-based models such as Large Language Models(LLMs), have led to significant progress in various natural language tasks. Nevertheless, amidst the growing trend of evaluating LLMs across diverse tasks, evaluating LLMs in the realm of automated answer assessment has not received much attention. To address this gap, we explore the potential of using LLMs for automated assessment of student’s short and open-ended answers in program comprehension tasks. Particularly, we use LLMs to compare students’ explanations with expert explanations in the context of line-by-line explanations of computer programs. For comparison purposes, we assess both decoder-only Large Language Models (LLMs) and encoder-based Semantic Textual Similarity (STS) models in the context of assessing the correctness of students’ explanation of computer code. Our findings indicate that decoder-only LLMs, when prompted in few-shot and chain-of-thought setting perform comparable to fine-tuned encoder-based models in evaluating students’ short answers in the programming domain.
APA
Oli, P., Banjade, R., Chapagain, J. & Rus, V.. (2024). Automated Assessment of Students’ Code Comprehension using LLM. Proceedings of the 2024 AAAI Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 257:118-128 Available from https://proceedings.mlr.press/v257/oli24a.html.

Related Material