ChatPaper.aiChatPaper

DeepSeek-Coder-V2:突破代碼智能中封閉源模型的障礙

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

June 17, 2024
作者: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen, Xin Xie, Kang Guan, Yuxiang You, Aixin Liu, Qiushi Du, Wenjun Gao, Xuan Lu, Qinyu Chen, Yaohui Wang, Chengqi Deng, Jiashi Li, Chenggang Zhao, Chong Ruan, Fuli Luo, Wenfeng Liang
cs.AI

摘要

我們介紹了 DeepSeek-Coder-V2,一個開源的專家混合模型(MoE)程式語言模型,其在程式相關任務中實現了與 GPT4-Turbo 相當的性能。具體而言,DeepSeek-Coder-V2 是在 DeepSeek-V2 的中間檢查點進一步預訓練,額外使用了 6 兆個標記。通過這種持續的預訓練,DeepSeek-Coder-V2 顯著增強了 DeepSeek-V2 的編碼和數學推理能力,同時在一般語言任務中保持了可比性能。與 DeepSeek-Coder-33B 相比,DeepSeek-Coder-V2 在程式相關任務、推理和一般能力的各個方面均取得了顯著進展。此外,DeepSeek-Coder-V2 將其對編程語言的支持從 86 個擴展到 338 個,同時將上下文長度從 16K 擴展到 128K。在標準基準評估中,DeepSeek-Coder-V2 在編碼和數學基準測試中實現了優越性能,優於 GPT4-Turbo、Claude 3 Opus 和 Gemini 1.5 Pro 等封閉源模型。
English
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.

Summary

AI-Generated Summary

PDF643December 4, 2024