ChatPaper.aiChatPaper

SOLAR 10.7B:通过简单而有效的深度放大扩展大型语言模型

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

December 23, 2023
作者: Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim
cs.AI

摘要

我们引入了深度上采样(DUS),这是一种新颖的技术,可以以简单的方式高效有效地提升基础LLM。与专家混合模型(MoE)相比,DUS不需要对训练和推断进行复杂的更改。利用DUS,我们构建了SOLAR 10.7B,一个具有107亿参数的大型语言模型(LLM),在各种自然语言处理(NLP)任务中展现出卓越的性能。比较评估显示,SOLAR 10.7B的表现优于现有的开源预训练LLM,如Llama 2和Mistral 7B。此外,我们还推出了SOLAR 10.7B-Instruct,这是一个针对指令遵循能力进行微调的变体,超越了Mixtral-8x7B。SOLAR 10.7B在Apache 2.0许可下公开提供,促进了LLM领域的广泛访问和应用。
English
We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.
PDF599December 15, 2024