蒙特梭利教学:生成针对学生学习量身定制的影响力培训数据
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
October 18, 2024
作者: Xiaochuan Li, Zichun Yu, Chenyan Xiong
cs.AI
摘要
合成数据被广泛用于训练大型语言模型,但其生成性质不可避免地引入了嘈杂、非信息性和误导性的学习信号。本文提出了蒙特梭利教导(Montessori-Instruct), 一种新颖的数据合成框架,该框架调整了教师语言模型的数据合成能力,以适应学生语言模型的学习过程。具体而言,我们利用合成训练数据点对学生的本地数据影响来表征学生的学习偏好。然后,我们使用直接偏好优化(DPO)来训练教师模型,生成符合学生学习偏好的合成数据。在Alpaca Eval和MT-Bench上使用Llama3-8B-Instruct(教师)和Llama3-8B(学生)进行实验表明,蒙特梭利教导相对于标准合成方法分别提高了18.35%和46.24%。我们的方法还击败了由更强大的教师模型GPT-4o合成的数据。进一步分析证实了教师学习的好处,即生成更具影响力的训练数据,有助于学生的改善学习,以及本地数据影响在准确衡量学生偏好方面的优势,以及蒙特梭利教导在不同学生模型中的稳健性。我们的代码和数据在 https://github.com/cxcscmu/Montessori-Instruct 上开源。
English
Synthetic data has been widely used to train large language models, but their
generative nature inevitably introduces noisy, non-informative, and misleading
learning signals. In this paper, we propose Montessori-Instruct, a novel data
synthesis framework that tailors the data synthesis ability of the teacher
language model toward the student language model's learning process.
Specifically, we utilize local data influence of synthetic training data points
on students to characterize students' learning preferences. Then, we train the
teacher model with Direct Preference Optimization (DPO) to generate synthetic
data tailored toward student learning preferences. Experiments with
Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and
MT-Bench demonstrate that Montessori-Instruct significantly outperforms
standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also
beats data synthesized by a stronger teacher model, GPT-4o. Further analysis
confirms the benefits of teacher's learning to generate more influential
training data in the student's improved learning, the advantages of local data
influence in accurately measuring student preferences, and the robustness of
Montessori-Instruct across different student models. Our code and data are
open-sourced at https://github.com/cxcscmu/Montessori-Instruct.Summary
AI-Generated Summary