SOLAR 10.7B：通过简单而有效的深度放大扩展大型语言模型

摘要

我们引入了深度上采样（DUS），这是一种新颖的技术，可以以简单的方式高效有效地提升基础LLM。与专家混合模型（MoE）相比，DUS不需要对训练和推断进行复杂的更改。利用DUS，我们构建了SOLAR 10.7B，一个具有107亿参数的大型语言模型（LLM），在各种自然语言处理（NLP）任务中展现出卓越的性能。比较评估显示，SOLAR 10.7B的表现优于现有的开源预训练LLM，如Llama 2和Mistral 7B。此外，我们还推出了SOLAR 10.7B-Instruct，这是一个针对指令遵循能力进行微调的变体，超越了Mixtral-8x7B。SOLAR 10.7B在Apache 2.0许可下公开提供，促进了LLM领域的广泛访问和应用。

English

We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

SOLAR 10.7B：通过简单而有效的深度放大扩展大型语言模型

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

摘要

Support