個性化蒸餾：為代碼生成賦能開源LLM模型的適應性學習

摘要

隨著功能強大的封閉型 LLMs（如ChatGPT、GPT-4）的崛起，人們對將封閉型 LLMs 的能力提煉至較小的開源型 LLMs 越來越感興趣。先前的提煉方法通常促使 ChatGPT 生成一組指令和答案，供學生模型學習。然而，這種標準提煉方法忽略了學生模型的優點和條件。受現代教學原則的啟發，我們設計了一個個性化提煉過程，其中學生首先嘗試解決一個任務，然後教師提供適應性的改進，讓學生進步。與向學生灌輸教師先前知識的方式不同，個性化提煉實現了學生模型的個性化學習，因為它僅在犯錯的示例上進行學習，並學會改進自己的解決方案。在代碼生成方面，個性化提煉始終優於標準提煉，僅需三分之一的數據。通過僅有 2.5-3K 個個性化示例，產生 4-6 美元的數據收集成本，我們將 CodeGen-mono-16B 的性能提升了 7%，實現了 36.4% 的 pass@1，並將 StarCoder 的性能提升了 12.2%，實現了 45.8% 的 pass@1 在 HumanEval 上。

English

With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in which the student attempts to solve a task first, then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with teacher's prior, personalised distillation enables personalised learning for the student model, as it only learns on examples it makes mistakes upon and learns to improve its own solution. On code generation, personalised distillation consistently outperforms standard distillation with only one third of the data. With only 2.5-3K personalised examples that incur a data-collection cost of 4-6$, we boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.

個性化蒸餾：為代碼生成賦能開源LLM模型的適應性學習

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

摘要

Support