SOLAR 10.7B：シンプルかつ効果的な深度アップスケーリングによる大規模言語モデルのスケーリング

要旨

本論文では、ベースとなる大規模言語モデル（LLM）を効率的かつ効果的にスケールアップする新たな手法である「深度アップスケーリング（Depth Up-Scaling, DUS）」を提案する。DUSは、専門家混合モデル（Mixture-of-Experts, MoE）とは異なり、学習や推論における複雑な変更を必要としない。DUSを活用し、107億パラメータを有する大規模言語モデル「SOLAR 10.7B」を構築し、様々な自然言語処理（NLP）タスクにおいて優れた性能を実証した。比較評価の結果、SOLAR 10.7BはLlama 2やMistral 7Bなどの既存のオープンソース事前学習LLMを凌駕することが示された。さらに、指示追従能力に特化してファインチューンされたバリアント「SOLAR 10.7B-Instruct」を提示し、Mixtral-8x7Bを上回る性能を達成した。SOLAR 10.7BはApache 2.0ライセンスの下で公開されており、LLM分野における広範なアクセスと応用を促進する。

English

We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

SOLAR 10.7B：シンプルかつ効果的な深度アップスケーリングによる大規模言語モデルのスケーリング

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

要旨

Support