ParaStudent: LLMに苦戦を教えることによる現実的な学生コードの生成と評価

要旨

大規模言語モデル（LLMs）はプログラミングタスクにおいて高い性能を示しているが、実際の学生のように不完全で反復的、かつスタイル的に多様な「学生らしい」コードを生成できるだろうか？本論文では、初級プログラミングコースの設定において、LLMベースの「学生らしい」コード生成を体系的に研究したParaStudentを紹介する。複数の学期にわたるタイムスタンプ付きの学生提出データセットを用いて、学生の進捗をモデル化し、コード出力を意味的、機能的、およびスタイル的側面から評価するための低解像度および高解像度の実験を設計した。その結果、ファインチューニングが実際の学生の軌跡との整合性を大幅に向上させ、エラーパターン、漸進的改善、およびスタイルのバリエーションをより忠実に捉えることが示された。本研究は、現実的な学生コードのモデル化には、文脈を考慮した生成、時間的モデリング、および多次元的評価を通じて学習ダイナミクスを捉えることが必要であることを示している。実験および評価のコードはhttps://github.com/mmiroyan/ParaStudentで公開されている。

English

Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students - imperfect, iterative, and stylistically diverse? We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting. Using a dataset of timestamped student submissions across multiple semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs along semantic, functional, and stylistic dimensions. Our results show that fine-tuning significantly improves alignment with real student trajectories and captures error patterns, incremental improvements, and stylistic variations more faithfully. This study shows that modeling realistic student code requires capturing learning dynamics through context-aware generation, temporal modeling, and multi-dimensional evaluation. Code for experiments and evaluation is available at https://github.com/mmiroyan/ParaStudent.

ParaStudent: LLMに苦戦を教えることによる現実的な学生コードの生成と評価

ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

要旨

Support