DeepSeek-Coder-V2: コード知能におけるクローズドソースモデルの壁を打ち破る

要旨

私たちは、コード特化タスクにおいてGPT4-Turboに匹敵する性能を発揮するオープンソースのMixture-of-Experts（MoE）コード言語モデル、DeepSeek-Coder-V2を発表します。具体的には、DeepSeek-Coder-V2はDeepSeek-V2の中間チェックポイントからさらに6兆トークンを追加で事前学習しています。この継続的な事前学習を通じて、DeepSeek-Coder-V2はDeepSeek-V2のコーディング能力と数学的推論能力を大幅に向上させながら、一般的な言語タスクにおいても同等の性能を維持しています。DeepSeek-Coder-33Bと比較して、DeepSeek-Coder-V2はコード関連タスクのさまざまな側面、および推論能力と一般的な能力において大きな進歩を示しています。さらに、DeepSeek-Coder-V2はサポートするプログラミング言語を86から338に拡大し、コンテキスト長を16Kから128Kに延長しています。標準的なベンチマーク評価では、DeepSeek-Coder-V2はGPT4-Turbo、Claude 3 Opus、Gemini 1.5 Proといったクローズドソースモデルをコーディングおよび数学のベンチマークで上回る性能を達成しています。

English

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.

DeepSeek-Coder-V2: コード知能におけるクローズドソースモデルの壁を打ち破る

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

要旨

Support