SOLAR 10.7B:通過簡單而有效的深度擴展來擴展大型語言模型
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
December 23, 2023
作者: Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim
cs.AI
摘要
我們介紹了深度上採樣(DUS),這是一種新穎的技術,可以以簡單的方式高效且有效地提升基礎LLM。與專家混合(MoE)相比,DUS 不需要對訓練和推理進行複雜的更改。使用DUS,我們構建了SOLAR 10.7B,一個具有107億個參數的大型語言模型(LLM),在各種自然語言處理(NLP)任務中展現出優越性能。比較評估顯示,SOLAR 10.7B 優於現有的開源預訓練LLM,如Llama 2 和Mistral 7B。此外,我們還提出了SOLAR 10.7B-Instruct,這是一個經過微調以提升指示遵循能力的變體,勝過了Mixtral-8x7B。SOLAR 10.7B 在Apache 2.0 許可證下公開提供,促進了廣泛訪問和應用於LLM領域。
English
We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs
efficiently and effectively in a simple manner. In contrast to
mixture-of-experts (MoE), DUS does not require complex changes to train and
inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with
10.7 billion parameters, demonstrating superior performance in various natural
language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B
outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral
7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for
instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is
publicly available under the Apache 2.0 license, promoting broad access and
application in the LLM field.