Orca: GPT-4の複雑な説明トレースからの漸進的学習

要旨

近年の研究では、大規模基盤モデル（LFM）が生成する出力を活用した模倣学習を通じて、より小規模なモデルの能力を向上させることに焦点が当てられてきた。これらのモデルの品質には、浅いLFM出力からの限定的な模倣信号、小規模で均質なトレーニングデータ、そして特に厳密な評価の欠如による小規模モデルの能力過大評価といった多くの課題が影響している。小規模モデルはLFMのスタイルを模倣する傾向があるが、その推論プロセスを模倣することはない。これらの課題に対処するため、我々はOrca（LLaMAの公開ポリシーに従ってモデル重みの差分を公開するために法務チームと協力中、詳細はhttps://aka.ms/orca-lmにて公開予定）を開発した。Orcaは13億パラメータのモデルであり、LFMの推論プロセスを模倣することを学習する。Orcaは、ChatGPTからの教師アシスタンスに導かれ、GPT-4からの説明トレース、段階的な思考プロセス、その他の複雑な指示を含む豊富な信号から学習する。この漸進的な学習を促進するために、大規模で多様な模倣データを慎重なサンプリングと選択によって活用する。Orcaは、Big-Bench Hard（BBH）のような複雑なゼロショット推論ベンチマークにおいて、Vicuna-13Bなどの従来の最先端の指示調整モデルを100%以上上回り、AGIEvalでは42%の性能向上を示す。さらに、OrcaはBBHベンチマークにおいてChatGPTと同等の性能を達成し、SAT、LSAT、GRE、GMATなどの専門的および学術的な試験において、CoTなしのゼロショット設定で競争力のある性能（最適化されたシステムメッセージとの4ポイント差）を示すが、GPT-4には及ばない。我々の研究は、人間またはより高度なAIモデルによって生成された段階的な説明から学習することが、モデルの能力とスキルを向上させるための有望な方向性であることを示唆している。

English

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

Orca: GPT-4の複雑な説明トレースからの漸進的学習

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

要旨

Support