ParaStudent：通过教会大语言模型“挣扎”来生成与评估逼真的学生代码

摘要

大型语言模型（LLMs）在编程任务上展现了强大的性能，但它们能否生成像真实学生那样不完美、迭代且风格多样的代码？我们提出了ParaStudent，这是一项在入门编程课程背景下对基于LLM的“学生式”代码生成的系统性研究。利用跨多个学期的时间戳学生提交数据集，我们设计了低分辨率和高分辨率的实验，以模拟学生进度，并从语义、功能和风格三个维度评估代码输出。我们的结果表明，微调显著提高了与真实学生学习轨迹的契合度，更忠实地捕捉了错误模式、渐进改进和风格变化。本研究表明，要模拟真实的学生代码，需通过上下文感知生成、时间建模和多维度评估来捕捉学习动态。实验与评估代码可在https://github.com/mmiroyan/ParaStudent获取。

English

Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students - imperfect, iterative, and stylistically diverse? We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting. Using a dataset of timestamped student submissions across multiple semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs along semantic, functional, and stylistic dimensions. Our results show that fine-tuning significantly improves alignment with real student trajectories and captures error patterns, incremental improvements, and stylistic variations more faithfully. This study shows that modeling realistic student code requires capturing learning dynamics through context-aware generation, temporal modeling, and multi-dimensional evaluation. Code for experiments and evaluation is available at https://github.com/mmiroyan/ParaStudent.

ParaStudent：通过教会大语言模型“挣扎”来生成与评估逼真的学生代码

ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

摘要

Support