Orca：從GPT-4的複雜解釋軌跡中進行漸進式學習

摘要

近期的研究集中在通過模仿學習來增強較小模型的能力，利用大型基礎模型（LFMs）生成的輸出。許多問題影響了這些模型的質量，從淺層LFM輸出中有限的模仿信號；小規模同質訓練數據；以及尤其是缺乏嚴格的評估，導致高估小模型的能力，因為它們傾向於學習模仿風格，而非LFMs的推理過程。為了應對這些挑戰，我們開發了Orca（我們正在與我們的法律團隊合作，根據LLaMA的發布政策公開發布模型權重的diff，將在https://aka.ms/orca-lm上發布），這是一個擁有130億參數的模型，學習模仿LFMs的推理過程。Orca從GPT-4獲取豐富的信號，包括解釋蹤跡；逐步思考過程；和其他複雜指令，並在ChatGPT的教師協助下進行引導。為了促進這種漸進式學習，我們利用大規模和多樣化的模仿數據進行慎重的抽樣和選擇。Orca在複雜的零-shot推理基準測試中，如Big-Bench Hard（BBH）上比Vicuna-13B等傳統最先進的指令調整模型提高了100%以上，並在AGIEval上提高了42%。此外，Orca在BBH基準測試中達到了與ChatGPT的平等水平，在專業和學術考試中（如SAT、LSAT、GRE和GMAT）的零-shot設置中，沒有CoT，並在優化系統消息方面落後於GPT-4。我們的研究表明，從逐步解釋中學習，無論這些是由人類還是更高級的AI模型生成的，都是改進模型能力和技能的一個有前途的方向。

English

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

Orca：從GPT-4的複雜解釋軌跡中進行漸進式學習

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

摘要

Support