基于GPT-4复杂解释轨迹的渐进式学习：鲸鱼

摘要

最近的研究集中在通过模仿学习来增强较小模型的能力，利用大型基础模型（LFMs）生成的输出。许多问题影响了这些模型的质量，从浅层LFM输出中有限的模仿信号；小规模同质训练数据；尤其是缺乏严格评估导致高估小模型的能力，因为它们倾向于学习模仿LFMs的风格，而非推理过程。为了解决这些挑战，我们开发了Orca（我们正在与法律团队合作，根据LLaMA的发布政策公开发布模型权重的差异，将在https://aka.ms/orca-lm上发布），这是一个拥有130亿参数的模型，学习模仿LFMs的推理过程。Orca从GPT-4获得丰富的信号，包括解释追踪；逐步思考过程；和其他复杂指令，通过ChatGPT的教师辅助进行引导。为了促进这种渐进式学习，我们利用大规模和多样化的模仿数据进行审慎的抽样和选择。Orca在复杂的零-shot推理基准测试（如Big-Bench Hard，BBH）上比传统的最先进的指令调整模型（如Vicuna-13B）提高了100%以上，AGIEval上提高了42%。此外，Orca在BBH基准测试上达到了与ChatGPT的平等水平，并在专业和学术考试（如SAT、LSAT、GRE和GMAT）中表现出竞争力（与优化系统消息相比有4个点的差距），在没有CoT的零-shot设置中，略逊于GPT-4。我们的研究表明，从逐步解释中学习，无论这些是由人类还是更先进的AI模型生成的，都是改进模型能力和技能的一个有前途的方向。

English

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

基于GPT-4复杂解释轨迹的渐进式学习：鲸鱼

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

摘要

Support