ParaStudent：通过教导大型语言模型经历编程困境来生成与评估逼真的学生代码

摘要

大型语言模型（LLMs）在编程任务上展现了强大的性能，但它们能否生成如真实学生般不完美、迭代且风格多样的“学生式”代码？我们提出了ParaStudent，这是一项在入门编程课程背景下对基于LLM的“学生式”代码生成进行的系统性研究。利用跨多个学期的时间戳学生提交数据集，我们设计了低分辨率与高分辨率实验，以模拟学生进展并沿语义、功能及风格维度评估代码输出。我们的结果表明，微调显著提升了与真实学生学习轨迹的契合度，并更忠实地捕捉了错误模式、渐进式改进及风格变化。本研究揭示，模拟真实学生代码需通过上下文感知生成、时序建模及多维度评估来捕捉学习动态。实验与评估代码可在https://github.com/mmiroyan/ParaStudent获取。

English

Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students - imperfect, iterative, and stylistically diverse? We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting. Using a dataset of timestamped student submissions across multiple semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs along semantic, functional, and stylistic dimensions. Our results show that fine-tuning significantly improves alignment with real student trajectories and captures error patterns, incremental improvements, and stylistic variations more faithfully. This study shows that modeling realistic student code requires capturing learning dynamics through context-aware generation, temporal modeling, and multi-dimensional evaluation. Code for experiments and evaluation is available at https://github.com/mmiroyan/ParaStudent.

ParaStudent：通过教导大型语言模型经历编程困境来生成与评估逼真的学生代码

ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

摘要

Support