Orca 2: 小規模言語モデルに推論方法を教える

要旨

Orca 1は、説明トレースなどの豊富なシグナルから学習し、BigBench HardやAGIEvalなどのベンチマークで従来の指示チューニングモデルを凌駕する性能を発揮します。Orca 2では、改善されたトレーニングシグナルがより小さな言語モデル（LM）の推論能力をどのように向上させるかを引き続き探求しています。小さなLMのトレーニングに関する研究では、しばしば模倣学習を用いてより能力の高いモデルの出力を再現することが行われてきました。しかし、模倣に過度に依存することは、小さなモデルの潜在能力を制限する可能性があると私たちは考えます。私たちは、小さなLMに対して、タスクごとに異なる解決戦略を採用することを教えることを目指しています。これは、より大きなモデルが使用する戦略とは異なる場合もあります。例えば、より大きなモデルが複雑なタスクに対して直接的な答えを提供する一方で、小さなモデルは同じ能力を持たないかもしれません。Orca 2では、モデルにさまざまな推論手法（ステップバイステップ、リコールして生成、リコール-推論-生成、直接回答など）を教えます。さらに重要なのは、モデルが各タスクに対して最も効果的な解決戦略を決定することを学ぶことを支援することです。Orca 2は、15の多様なベンチマーク（約100のタスクと36,000以上のユニークなプロンプトに対応）を使用して評価されます。Orca 2は、同サイズのモデルを大幅に上回り、ゼロショット設定で高度な推論能力をテストする複雑なタスクにおいて、5～10倍大きいモデルと同等またはそれ以上の性能を達成します。私たちは、Orca 2をオープンソース化し、小さなLMの開発、評価、アラインメントに関するさらなる研究を促進します。

English

Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. We open-source Orca 2 to encourage further research on the development, evaluation, and alignment of smaller LMs.

Orca 2: 小規模言語モデルに推論方法を教える

Orca 2: Teaching Small Language Models How to Reason

要旨

Support