モンテッソーリ指導法：学生の学習に適した影響力のあるトレーニングデータを生成

要旨

合成データは大規模言語モデルの訓練に広く利用されていますが、その生成的性質によりノイズ、非情報的な要素、誤解を招く学習信号が不可避に導入されます。本論文では、Montessori-Instructという新しいデータ合成フレームワークを提案し、教師言語モデルのデータ合成能力を学習プロセスに合わせるものです。具体的には、合成トレーニングデータポイントの学習者への局所データ影響を利用して学習者の学習傾向を特徴付けます。その後、教師モデルを直接選好最適化（DPO）で訓練し、学習者の学習傾向に合わせた合成データを生成します。Alpaca EvalおよびMT-BenchでLlama3-8B-Instruct（教師）とLlama3-8B（学習者）を用いた実験の結果、Montessori-Instructは標準的な合成手法よりも18.35\%および46.24\%相対的に優れていることが示されました。また、より強力な教師モデルであるGPT-4oによって合成されたデータを上回る結果も得られました。さらなる分析により、教師の学習が学習者の向上により影響力のあるトレーニングデータを生成する利点、局所データ影響が学習者の嗜好を正確に測定する利点、Montessori-Instructの異なる学習者モデルに対する堅牢性が確認されました。当該手法のコードとデータはhttps://github.com/cxcscmu/Montessori-Instruct でオープンソースとして公開されています。

English

Synthetic data has been widely used to train large language models, but their generative nature inevitably introduces noisy, non-informative, and misleading learning signals. In this paper, we propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model's learning process. Specifically, we utilize local data influence of synthetic training data points on students to characterize students' learning preferences. Then, we train the teacher model with Direct Preference Optimization (DPO) to generate synthetic data tailored toward student learning preferences. Experiments with Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and MT-Bench demonstrate that Montessori-Instruct significantly outperforms standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also beats data synthesized by a stronger teacher model, GPT-4o. Further analysis confirms the benefits of teacher's learning to generate more influential training data in the student's improved learning, the advantages of local data influence in accurately measuring student preferences, and the robustness of Montessori-Instruct across different student models. Our code and data are open-sourced at https://github.com/cxcscmu/Montessori-Instruct.

モンテッソーリ指導法：学生の学習に適した影響力のあるトレーニングデータを生成

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

要旨

Support