SOLAR 10.7B：通過簡單而有效的深度擴展來擴展大型語言模型

摘要

我們介紹了深度上採樣（DUS），這是一種新穎的技術，可以以簡單的方式高效且有效地提升基礎LLM。與專家混合（MoE）相比，DUS 不需要對訓練和推理進行複雜的更改。使用DUS，我們構建了SOLAR 10.7B，一個具有107億個參數的大型語言模型（LLM），在各種自然語言處理（NLP）任務中展現出優越性能。比較評估顯示，SOLAR 10.7B 優於現有的開源預訓練LLM，如Llama 2 和Mistral 7B。此外，我們還提出了SOLAR 10.7B-Instruct，這是一個經過微調以提升指示遵循能力的變體，勝過了Mixtral-8x7B。SOLAR 10.7B 在Apache 2.0 許可證下公開提供，促進了廣泛訪問和應用於LLM領域。

English

We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

SOLAR 10.7B：通過簡單而有效的深度擴展來擴展大型語言模型

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

摘要

Support