ParaStudent: 학생의 고민을 가르쳐 LLM이 현실적인 학생 코드를 생성하고 평가하는 방법

초록

대형 언어 모델(LLMs)은 프로그래밍 작업에서 강력한 성능을 보여주지만, 실제 학생들처럼 불완전하고 반복적이며 스타일적으로 다양한 코드를 생성할 수 있을까? 본 연구에서는 초급 프로그래밍 과정 설정에서 LLM 기반의 "학생 같은" 코드 생성을 체계적으로 연구한 ParaStudent를 소개한다. 여러 학기에 걸친 타임스탬프가 기록된 학생 제출물 데이터셋을 사용하여, 학생의 진행 상황을 모델링하고 코드 출력을 의미론적, 기능적, 스타일적 차원에서 평가하기 위한 저해상도 및 고해상도 실험을 설계하였다. 연구 결과, 미세 조정(fine-tuning)은 실제 학생의 학습 궤적과의 일치도를 크게 향상시키며, 오류 패턴, 점진적인 개선, 스타일적 변이를 더 충실히 포착하는 것으로 나타났다. 이 연구는 현실적인 학생 코드 모델링을 위해서는 상황 인식 생성, 시간적 모델링, 다차원 평가를 통해 학습 역학을 포착해야 함을 보여준다. 실험 및 평가를 위한 코드는 https://github.com/mmiroyan/ParaStudent에서 확인할 수 있다.

English

Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students - imperfect, iterative, and stylistically diverse? We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting. Using a dataset of timestamped student submissions across multiple semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs along semantic, functional, and stylistic dimensions. Our results show that fine-tuning significantly improves alignment with real student trajectories and captures error patterns, incremental improvements, and stylistic variations more faithfully. This study shows that modeling realistic student code requires capturing learning dynamics through context-aware generation, temporal modeling, and multi-dimensional evaluation. Code for experiments and evaluation is available at https://github.com/mmiroyan/ParaStudent.

ParaStudent: 학생의 고민을 가르쳐 LLM이 현실적인 학생 코드를 생성하고 평가하는 방법

ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

초록

Support