个性化蒸馏：为代码生成赋能开源LLMs的自适应学习

摘要

随着功能强大的封闭源LLM（如ChatGPT、GPT-4）的崛起，人们对将封闭源LLM的能力提炼至规模较小的开源LLM表现出越来越浓厚的兴趣。先前的提炼方法通常促使ChatGPT生成一组指令和答案，供学生模型学习。然而，这种标准的提炼方法忽视了学生模型的优点和条件。受现代教学原则启发，我们设计了一种个性化提炼过程，其中学生首先尝试解决一个任务，然后老师提供适应性的改进，以帮助学生提高。个性化提炼不同于向学生灌输老师的先验知识，它实现了学生模型的个性化学习，因为学生模型只学习在哪些示例上犯错，并学会改进自己的解决方案。在代码生成方面，个性化提炼始终优于标准提炼，且仅需三分之一的数据。仅需2.5-3K个个性化示例，数据收集成本为4-6美元，我们将CodeGen-mono-16B的通过率提高了7%，达到36.4%，将StarCoder提高了12.2%，达到45.8%的通过率在HumanEval上。

English

With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in which the student attempts to solve a task first, then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with teacher's prior, personalised distillation enables personalised learning for the student model, as it only learns on examples it makes mistakes upon and learns to improve its own solution. On code generation, personalised distillation consistently outperforms standard distillation with only one third of the data. With only 2.5-3K personalised examples that incur a data-collection cost of 4-6$, we boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.

个性化蒸馏：为代码生成赋能开源LLMs的自适应学习

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

摘要

Support