<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Proceedings of Machine Learning Research</title>
    <description>Proceedings of the Innovation and Responsibility in AI-Supported Education Workshop
  Held in Philadelphia, Pennsylvania, USA on 03 March 2025

Published as Volume 273 by the Proceedings of Machine Learning Research on 31 March 2025.

Volume Edited by:
  Zichao Wang
  Simon Woodhead
  Muktha Ananda
  Debshila Basu Mallick
  James Sharpnack
  Jill Burstein

Series Editors:
  Neil D. Lawrence
</description>
    <link>https://proceedings.mlr.press/v273/</link>
    <atom:link href="https://proceedings.mlr.press/v273/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Mon, 12 May 2025 09:44:42 +0000</pubDate>
    <lastBuildDate>Mon, 12 May 2025 09:44:42 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      <item>
        <title>Can LLMs Teach Human Learners to Understand Concepts Through Analogies?</title>
        <description>Large Language Models (LLMs) hold significant potential to revolutionize education by enabling personalized and effective learning experiences. As cognitive learning principles are gradually applied to designing educative LLMs, our research focuses on this crucial question: can LLMs enhance student comprehension of complex concepts through analogy-based tutoring, a pedagogical method proven useful in learning science? To address this, we propose a two-stage experimental framework. First, LLM tutors generate analogies for teaching specific target concepts, leveraging prompting techniques to adapt to simulated or real student profiles. Second, these learners engage with the analogies and subsequently complete multiple-choice question to evaluate their conceptual understanding. Our initial findings reveal that analogy-based tutoring enhances student engagement and conceptual mastery, achieving a notable improvement in comprehension. These results underscore the effectiveness of LLM-driven analogy-based tutoring in advancing educational outcomes and pave the way for future research in this domain.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/ye25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/ye25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Classroom Observation Evaluation with Large Language Models</title>
        <description>To improve efficiency of evaluating classroom Instructional Support (IS) and enhance the reliability of the IS score evaluation system, we proposed a novel annotation protocol based on classroom discourse types and a framework which employed large language models (LLM) as the core component to estimate IS score automatically. We constructed the SentTag dataset which was annotated by the proposed annotation protocol. The Fleiss’ Kappa among all annotators was 0.7120. Additionally Llama 3.1 models were fine-tuned on this dataset, achieving an accuracy of 0.7864 in classifying discourse types. While these features were not able to predict IS scores accurately (RMSE = 2.6584 and PCC = 0.1197), they could potentially serve as useful qualitative feedback to teachers on their classroom discourse. Future research will explore the integration of multimodal features, local session characteristics, and the generalization of the framework to diverse classroom settings.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/wang25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/wang25a.html</guid>
        
        
      </item>
    
      <item>
        <title>MultiTutor: Collaborative LLM Agents for Multimodal Student Support</title>
        <description>The advent of Large Language Models (LLMs) has revolutionized education, introducing AI tools that enhance teaching and learning. Once purely natural language processors, LLMs have evolved into autonomous agents capable of complex tasks, from software development to high-level trading decisions. However, most educational applications only focus on classroom simulations or single-agent automation, leaving the potential of multi-agent systems for personalized support underexplored. To address this, we propose MultiTutor, a multi-agent tutoring framework tailored to individual student needs. MultiTutor uses internet searches and code generation to produce multimodal outputs like images and animations while expert agents synthesize information to deliver explanatory text, create visualizations, suggest resources, design practice problems, and develop interactive simulations. By identifying knowledge gaps and scaffolding learning, MultiTutor offers a transformative, accessible approach to education. Evaluation against baseline models across metrics like cognitive complexity, readability, depth, and diversity shows MultiTutor consistently outperforms in quality and relevance. Case studies further highlight its potential as an innovative solution for automated tutoring and student support.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/sun25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/sun25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Representational Alignment Supports Effective Teaching</title>
        <description>A good teacher should not only be knowledgeable, but should also be able to communicate in a way that the student understands – to share the student’s representation of the world. In this work, we introduce a new controlled experimental setting, GRADE, to study pedagogy and representational alignment. We use GRADE through a series of machine-machine and machine-human teaching experiments to characterize a utility curve defining a relationship between representational alignment, teacher expertise, and student learning outcomes. We find that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), but that this effect is moderated by the size and representational diversity of the class being taught. We use these insights to design a preliminary classroom matching procedure, GRADE-Match, that optimizes the assignment of students to teachers. When designing machine teachers, our results suggest that it is important to focus not only on accuracy, but also on representational alignment with human learners.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/sucholutsky25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/sucholutsky25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving</title>
        <description>Providing effective feedback is important for student learning in programming problem-solving. In this sense, Large Language Models (LLMs) have emerged as potential tools to automate feedback generation. However, their reliability and ability to identify reasoning errors in student code remain not well understood. This study evaluates the performance of four LLMs (GPT-4o, GPT-4o mini, GPT-4-Turbo, and Gemini-1.5-pro) on a benchmark dataset of 45 student solutions. We assessed the models’ capacity to provide accurate and insightful feedback, particularly in identifying reasoning mistakes. Our analysis reveals that 63% of feedback hints were accurate and complete, while 37% contained mistakes, including incorrect line identification, flawed explanations, or hallucinated issues. These findings highlight the potential and limitations of LLMs in programming education and underscore the need for improvements to enhance reliability and minimize risks in educational applications.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/silva25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/silva25a.html</guid>
        
        
      </item>
    
      <item>
        <title>BIBLIOSMIA: Hyper-Personalized Consistent Stories for Enhanced Social Emotional Learning</title>
        <description>Consistent visual storytelling plays a central role in how humans teach their children to understand their emotions, relationships with others and the world around them. It enhances children’s cognitive and social-emotional development by providing engaging, believable narratives that help them navigate emotional complexities in a safe, imaginative context. This prevalence of visual storytelling in our lives has made it a prime application for technological advancements in artificial intelligence (AI). With AI integrations, digital stories can be created across an infinite set of topics and readily adapted to personalized contexts. As image generation algorithms advance, digital storytelling can be enhanced even further to incorporate visual elements that are unique to the author or the desired reader population. However, the burgeoning field of multi-modal story generation currently suffers from the problem of consistency - a critical element for preserving the plot lines of a story and the ability of children to relate to the characters therein. To mitigate this, we propose Bibliosmia, a novel framework for designing consistent digital stories. The framework encompasses three main components: a story generation module, an alignment module and an image generation and validation module that collectively preserve key narrative and visual story elements to allow expert and new authors alike to craft deeply personal developmental stories for children. We evaluated the effectiveness of the framework in the context of an online automatic story generation application. Our experimental results demonstrate Bibliosmia’s superior performance in prompt similarity (0.307 CLIP Score) and near-top-tier identity consistency (0.860 CLIP Score), surpassing other approaches in scalability and user satisfaction. These findings highlight Bibliosmia’s effectiveness in delivering high-quality, personalized storytelling experiences, setting a new standard in multi-modal digital storytelling.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/shahriyar25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/shahriyar25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Enhancing Learning Outcomes within a Large-Scale Online Learning System through AI-Powered Feedback</title>
        <description>Building on prior research, which demonstrated the effectiveness of a learning analytics-based feedback system in improving learner engagement and learning outcomes, this study addresses scalability challenges by automating feedback authoring using generative AI. Focusing on critical issues in distance education, including limited academic support, social isolation, and reduced learner motivation, we design and evaluate an AI-powered feedback system within a large-scale online learning environment. This study utilizes input data comprising learners’ online learning environment interactions, learning material engagement patterns, academic performance metrics, behavioral indicators, and demographic characteristics. The system generates AI-powered personalized feedback interventions based on the ARCS-V Motivation Model, Self-Regulated Learning principles, and Nudge Theory as its primary outputs. To assess the system’s effectiveness, more than 30,000 learners at a large distance education university will be randomly assigned to experimental and control groups. Preliminary work demonstrated the system’s readiness for a pilot evaluation. The next steps include assessing the system’s impact on diverse learner subgroups and refining system design based on user feedback. The study aims to advance our understanding of how AI-powered, personalized feedback influences self-regulated learning, motivation, and learning outcomes in online environments.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/ozturk25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/ozturk25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs</title>
        <description>The process of creating educational materials is both time-consuming and demanding for educators. This research explores the potential of Large Language Models (LLMs) to streamline this task by automating the generation of extended reading materials and relevant course suggestions. Using the TED-Ed Dig Deeper sections as an initial exploration, we investigate how supplementary articles can be enriched with contextual knowledge and connected to additional learning resources. Our method begins by generating extended articles from video transcripts, leveraging LLMs to include historical insights, cultural examples, and illustrative anecdotes. A recommendation system employing semantic similarity ranking identifies related courses, followed by an LLM-based refinement process to enhance relevance. The final articles are tailored to seamlessly integrate these recommendations, ensuring they remain cohesive and informative. Experimental evaluations demonstrate that our model produces high-quality content and accurate course suggestions, assessed through metrics such as Hit Rate, semantic similarity, and coherence. Our experimental analysis highlight the nuanced differences between the generated and existing materials, underscoring the model’s capacity to offer more engaging and accessible learning experiences. This study showcases how LLMs can bridge the gap between core content and supplementary learning, providing students with additional recommended resources while also assisting teachers in designing educational materials.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/liou25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/liou25a.html</guid>
        
        
      </item>
    
      <item>
        <title>ARCHED: A Human-Centered Framework for Transparent, Responsible, and Collaborative AI Assisted Instructional Design</title>
        <description>Integrating Large Language Models (LLMs) in educational technology reveals unprecedented opportunities to improve instructional design (ID), yet current approaches often prioritize automation over pedagogical rigor and human agency. This paper introduces ARCHED (AI for Responsible, Collaborative, Human-centered Education Instructional Design), a framework that implements a structured multi-stage workflow between educators and AI. Unlike existing tools that generate complete instructional materials autonomously, ARCHED cascades the development into distinct stages, from learning objective formulation to assessment design, each guided by Bloom’s taxonomy and enhanced by LLMs. This framework employs multiple specialized AI components that work in concert: one generating diverse pedagogical options, another evaluating their alignment with learning objectives while maintaining human educators as primary decision-makers. ARCHED addresses critical gaps in current AI-assisted instructional design regarding transparency, pedagogical foundation, and meaningful human agency through this approach. This research advances the responsible integration of AI in education by providing a concrete, theoretically grounded framework that prioritizes human expertise and educational accountability.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/li25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/li25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Efficient Multi-Task Inference with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring</title>
        <description>The integration of Artificial Intelligence (AI) in education requires scalable and efficient frameworks that balance performance, adaptability, and cost. This paper addresses these needs by proposing a shared backbone model architecture enhanced with lightweight LoRA adapters for task-specific fine-tuning, targeting the automated scoring of student responses across 27 mutually exclusive tasks. By achieving competitive performance (average QWK of 0.848 compared to 0.888 for fully fine-tuned models) while reducing GPU memory consumption by 60% and inference latency by 40%, the framework demonstrates significant efficiency gains. This approach aligns with the workshop’s focus on improving language models for educational tasks, creating responsible innovations for cost-sensitive deployment, and supporting educators by streamlining assessment workflows. The findings underscore the potential of scalable AI to enhance learning outcomes while maintaining fairness and transparency in automated scoring systems.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/latif25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/latif25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Comparing Few-Shot Prompting of GPT-4 LLMs with BERT Classifiers for Open-Response Assessment in Tutor Equity Training</title>
        <description>Assessing learners in ill-defined domains, such as scenario-based human tutoring training, is an area of limited research. Equity training requires a nuanced understanding of context, but do contemporary large language models (LLMs) have a knowledge base that can navigate these nuances? Legacy transformer models like BERT, in contrast, have less real-world knowledge but can be more easily fine-tuned than commercial LLMs. Here, we study whether fine-tuning BERT on human annotations outperforms state-of-the-art LLMs (GPT-4o and GPT-4-Turbo) with few-shot prompting and instruction. We evaluate performance on four prediction tasks involving generating and explaining open-ended responses in advocacy-focused training lessons in a higher education student population learning to become middle school tutors. Leveraging a dataset of 243 human-annotated open responses from tutor training lessons, we find that BERT demonstrates superior performance using an offline fine-tuning approach, which is more resource-efficient than commercial GPT models. We conclude that contemporary GPT models may not adequately capture nuanced response patterns, especially in complex tasks requiring explanation. This work advances the understanding of AI-driven learner evaluation under the lens of fine-tuning versus few-shot prompting on the nuanced task of equity training, contributing to more effective training solutions and assisting practitioners in choosing adequate assessment methods.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/kakarla25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/kakarla25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Adaptive Knowledge Assessment In Simulated Coding Interviews</title>
        <description>We present a system for simulating student coding interview responses to sequential inter-view questions, with the goal of accurately inferring student expertise levels. With these simulated students, we explored fixed and adaptive question selection policies, where the adaptive policy exploits a knowledge component dependency graph to maximize information gain. Our results show that adaptive questioning policies show increasing benefits compared to a fixed policy as student expertise levels rise, achieving expert assessment F1-scores of 0.4-0.8 for student expertise prediction compared to 0.25-0.35 for fixed strategies.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/ion25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/ion25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Improving LLM-based Automatic Essay Scoring with Linguistic Features</title>
        <description>Automatic Essay Scoring (AES) assigns scores to student essays, reducing the grading workload for instructors. Developing a scoring system capable of handling essays across diverse prompts is challenging due to the flexibility and diverse nature of the writing task. Previous work has shown promising results in AES by prompting large language models (LLMs). While prompting LLM is data efficient, it does not surpass supervised methods trained with extracted linguistic features Li and Ng (2024). In this paper, we combines both approaches by incorporating linguistic features into LLM-based scoring. Experiments show promising results from this hybrid method for both in-domain and out-of-domain essay prompts.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/hou25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/hou25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Towards an Efficient, Customizable, and Accessible AI Tutor</title>
        <description>We propose a novel AI tutoring system that combines a Retrieval-Augmented Generation (RAG) pipeline with a lightweight language model to provide efficient, customizable, and accessible educational support. Designed to operate offline with minimal computational resources, the system addresses the challenges faced by resource-constrained communities. To develop its knowledge capabilities, we explore various retrieval strategies starting from a knowledge base of college textbooks. This work lays the foundation for developing adaptable and equitable AI tutoring solutions that bridge educational gaps and empower learners in under-resourced communities.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/hevia25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/hevia25a.html</guid>
        
        
      </item>
    
      <item>
        <title>AI Awareness Survey of Educators</title>
        <description>In the age of Large Language Models (LLMs) purportedly being used by everyone, including students, do users actually know what they’re wielding? We present a study that aims to gauge the recognition of AI assisted processes in everyday life and understand what questions and concerns people have about AI. We particularly focus on educators, as the role of AI in the classroom raises concerns around technical and information literacy and the very nature of education.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/heady25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/heady25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Rethinking Math Benchmarks: Implications for AI in Education</title>
        <description>Several datasets have been created to evaluate LLM performance on mathematical reasoning tasks. Performance on these benchmarks is used as a proxy for a model’s math ability and to rank their capability relative to other models. These rankings play a crucial role for AIEd practitioners in selecting models for applications like math tutoring. Recent research has argued that several of these benchmarks have become too saturated, prompting the creation of new datasets with more difficult tasks. How can we gauge the effectiveness of these benchmarks for measuring math skills and producing reliable rankings? Leveraging the psychometric framework of Item Response Theory, we examine three math benchmarks: GSM8K, MATH, and MathOdyssey. We find that GSM8K and MathOdyssey are not suited to properly evaluate the current range of frontier model abilities, and are instead suited to models with lower and higher math abilities respectively. Moreover, current rankings derived from these benchmarks are unstable and fail to reliably capture the latent math ability they aim to measure. To remedy these issues, we recommend the integration of IRT analysis into the process of selecting questions for future benchmarks.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/castleman25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/castleman25a.html</guid>
        
        
      </item>
    
      <item>
        <title>A Personalized AI Coach to Assist in Self-Directed Learning</title>
        <description>Personalized learning is a powerful tool in online education, yet its application in inquiry-based modeling environments remains underexplored. Previous work has shown that learners that engage in a cycle of construction, parameterization, and simulation, which we refer to as the exploration cycle, create models with higher complexity and variety. In order to further study these findings we present an “exploration coach” that provides personalized feedback within the Virtual Experimental Research Assistant (VERA)—an interactive learning environment for conceptual modeling of complex systems that evaluates models through agent-based simulations. Our architecture, which classifies learners into groups using clustering techniques, allows us to determine what type of feedback would be useful to a learner at any point in their modeling journey. The coach then uses procedural scaffolding to guide learners through the exploration cycle. Lastly we illustrate how these categorizations and the exploration cycle map onto the cycle of self-directed learning.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/buckley25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/buckley25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs</title>
        <description>Factuality is a necessary precursor to useful educational tools. As adoption of Large Language Models (LLMs) in education continues of grow, ensuring correctness in all settings is paramount. Despite their strong English capabilities, LLM performance in other languages is largely untested. In this work, we evaluate the correctness of the Llama3.1 family of models in answering factual questions appropriate for middle and high school students. We demonstrate that LLMs not only provide extraneous and less truthful information, but also exacerbate existing biases against rare languages.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/berman25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/berman25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Evaluating Fairness in AI-Assisted Remote Proctoring</title>
        <description>Remote proctors make decisions about whether test takers have violated testing rules and, as a result, whether to certify test takers’ scores. These decisions rely on both AI signals and human evaluation of test-taking behaviors. Given that fairness is a key component of test validity evidence, it is critical that proctors’ decisions are unbiased with respect to proctor and test-taker background characteristics (e.g., gender, age, and nationality). In this study, we empirically evaluate whether proctor or test-taker background characteristics affect whether a test taker is flagged for rule violations. Results suggest that proctor and test-taker nationality may influence proctoring decisions, whereas gender and age do not. The direction of the influence generally reflects an “in-group, out-group” bias: proctors are less likely to identify rule violations among test takers with similar nationalities as proctors (in-group favoring) and more likely to identify rule violations among test takers of different nationalities (out-group disfavoring). Results also suggest that decisions based on AI signals may be less prone to in-group/out-group bias than decisions based on human evaluation only, although more research is needed to support this finding.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/belzak25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/belzak25a.html</guid>
        
        
      </item>
    
      <item>
        <title>M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards</title>
        <description>We present a demonstration of a web-based system called M2LADS (“System for Generating Multimodal Learning Analytics Dashboards”), designed to integrate, synchronize, visualize, and analyze multimodal data recorded during computer-based learning sessions with biosensors. This system presents a range of biometric and behavioral data on web-based dashboards, providing detailed insights into various physiological and activity-based metrics. The multimodal data visualized include electroencephalogram (EEG) data for assessing attention and brain activity, heart rate metrics, eye-tracking data to measure visual attention, webcam video recordings, and activity logs of the monitored tasks. M2LADS aims to assist data scientists in two key ways: (1) by providing a comprehensive view of participants’ experiences, displaying all data categorized by the activities in which participants are engaged, and (2) by synchronizing all biosignals and videos, facilitating easier data relabeling if any activity information contains errors.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/becerra25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/becerra25a.html</guid>
        
        
      </item>
    
      <item>
        <title>The LEVI training Hub: Evidence-Based Evaluation for AI in Education</title>
        <description>The rapid growth of education technology (ed tech) tools, including AI-powered applications, has highlighted the need for robust evaluation frameworks, particularly at early development stages. Current evaluation models, such as the Every Student Succeeds Act (ESSA) evidence tiers created by the U.S. Department of Education, may be appropriate for many education research activities, but miss critical stages in emerging AI-driven interventions. To support the Learning Engineering Virtual Institute (LEVI), a research collaboratory with the goal of doubling math learning rates in middle school students, we have developed a new evidence matrix to bridge this gap. This matrix incorporates a two-dimensional approach that evaluates research methods alongside outcome variables, enabling nuanced assessments of interventions along an ordered process. By categorizing research methods into five levels — ranging from randomized controlled trials to qualitative studies and modeling efforts, this matrix ensures comprehensive evaluation. Complementary outcome measures, emphasizing math learning gains, engagement, and model performance, contextualize these findings. This framework fosters alignment between research rigor and practical application, offering valuable insights into scaling educational innovations responsibly.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/andres25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/andres25a.html</guid>
        
        
      </item>
    
      <item>
        <title>AI Mentors for Student Projects: Spotting Early Issues in Computer Science Proposals</title>
        <description>When executed well, project-based learning (PBL) engages students’ intrinsic motivation, encourages students to learn far beyond a course’s limited curriculum, and prepares students to think critically and maturely about the skills and tools at their disposal. However, educators experience mixed results when using PBL in their classrooms: some students thrive with minimal guidance and others flounder. Early evaluation of project proposals could help educators determine which students need more support, yet evaluating project proposals and student aptitude is time-consuming and difficult to scale. In this work, we design, implement, and conduct an initial user study (n= 36) for a software system that collects project proposals and aptitude information to support educators in determining whether a student is ready to engage with PBL. We find that (1) users perceived the system as helpful for writing project proposals and identifying tools and technologies to learn more about, (2) educator ratings indicate that users with less technical experience in the project topic tend to write lower-quality project proposals, and (3) GPT-4o’s ratings show agreement with educator ratings. While the prospect of using LLMs to rate the quality of students’ project proposals is promising, its long-term effectiveness strongly hinges on future efforts at characterizing indicators that reliably predict students’ success and motivation to learn.</description>
        <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/aher25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/aher25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Scribbles That Speak: AI and Handwriting Analysis to Address Pathological Challenges</title>
        <description></description>
        <pubDate>Sun, 30 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/govindaraju25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/govindaraju25a.html</guid>
        
        
      </item>
    
      <item>
        <title>Preface: Innovation and Responsibility in AI-Supported Education</title>
        <description></description>
        <pubDate>Sun, 30 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://proceedings.mlr.press/v273/basumallick25a.html</link>
        <guid isPermaLink="true">https://proceedings.mlr.press/v273/basumallick25a.html</guid>
        
        
      </item>
    
  </channel>
</rss>
