DeepSeek LLM：利用长期主义扩展开源语言模型

摘要

开源大型语言模型（LLMs）的快速发展令人瞩目。然而，先前文献中描述的标度定律得出了不同的结论，给LLMs的扩展带来了阴影。我们深入研究标度定律，并提出了我们独特的发现，促进了两种常用开源配置下大规模模型的扩展，即7B和67B。在标度定律的指导下，我们推出了DeepSeek LLM，这是一个致力于以长期视角推进开源语言模型的项目。为支持预训练阶段，我们开发了一个数据集，目前包含2万亿标记，并不断扩展。我们进一步在DeepSeek LLM基础模型上进行了监督微调（SFT）和直接偏好优化（DPO），从而创建了DeepSeek Chat模型。我们的评估结果表明，DeepSeek LLM 67B在各种基准测试中超过了LLaMA-2 70B，特别是在代码、数学和推理领域。此外，开放式评估显示，DeepSeek LLM 67B Chat在性能上优于GPT-3.5。

English

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

DeepSeek LLM：利用长期主义扩展开源语言模型

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

摘要

Support