DeepSeek LLM: 長期的視点に基づくオープンソース言語モデルのスケーリング

要旨

オープンソースの大規模言語モデル（LLM）の急速な発展は、まさに目覚ましいものがあります。しかし、これまでの文献で述べられているスケーリング則はさまざまな結論を示しており、LLMのスケーリングに暗雲を投げかけています。私たちはスケーリング則の研究に深く入り込み、7Bと67Bという2つの一般的なオープンソース構成において大規模モデルのスケーリングを促進する独自の知見を提示します。スケーリング則に導かれ、私たちはDeepSeek LLMを紹介します。これは、長期的な視点でオープンソース言語モデルを進化させることに専念するプロジェクトです。事前学習段階を支援するため、現在2兆トークンから成り、継続的に拡大しているデータセットを開発しました。さらに、DeepSeek LLM Baseモデルに対して教師あり微調整（SFT）とDirect Preference Optimization（DPO）を実施し、DeepSeek Chatモデルを作成しました。評価結果は、DeepSeek LLM 67BがLLaMA-2 70Bをさまざまなベンチマークで上回り、特にコード、数学、推論の領域で優れていることを示しています。さらに、オープンエンドの評価では、DeepSeek LLM 67B ChatがGPT-3.5よりも優れた性能を発揮することが明らかになりました。

English

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

DeepSeek LLM: 長期的視点に基づくオープンソース言語モデルのスケーリング

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

要旨

Support