DeepSeek LLM：以長期主義擴展開源語言模型

摘要

開源大型語言模型（LLMs）的快速發展令人印象深刻。然而，先前文獻中描述的擴展法則呈現出不同的結論，這為擴展LLMs投下陰影。我們深入研究擴展法則，提出我們獨特的研究結果，促進了在兩種常用的開源配置（7B和67B）中擴展大型模型。在擴展法則的指導下，我們推出了DeepSeek LLM，這是一個致力於以長期視角推進開源語言模型的項目。為了支持預訓練階段，我們開發了一個數據集，目前包含2兆個標記，並持續擴充。我們進一步對DeepSeek LLM基本模型進行監督微調（SFT）和直接偏好優化（DPO），從而創建了DeepSeek Chat模型。我們的評估結果表明，DeepSeek LLM 67B在各種基準測試中超越了LLaMA-2 70B，特別是在代碼、數學和推理領域。此外，開放式評估顯示，DeepSeek LLM 67B Chat在性能上優於GPT-3.5。

English

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

DeepSeek LLM：以長期主義擴展開源語言模型

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

摘要

Support