ChatPaper.aiChatPaper

DeepSeek-Coder:當大型語言模型遇上編程--代碼智能的崛起

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

January 25, 2024
作者: Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, Wenfeng Liang
cs.AI

摘要

大型語言模型的快速發展已經在軟體開發中的程式碼智能方面帶來了革命。然而,封閉源模型的佔主導地位限制了廣泛的研究和開發。為了解決這個問題,我們介紹了DeepSeek-Coder系列,這是一系列開源程式碼模型,大小從13億到330億不等,從頭開始在兩兆標記上進行訓練。這些模型在高質量的專案級程式碼語料庫上進行了預訓練,並採用了一個16K窗口的填空任務來增強程式碼生成和填充。我們的廣泛評估表明,DeepSeek-Coder不僅在多個基準測試中實現了開源程式碼模型的最新性能,而且超越了現有的封閉源模型,如Codex和GPT-3.5。此外,DeepSeek-Coder模型採用寬鬆許可證,允許進行研究和無限制的商業使用。
English
The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. These models are pre-trained on a high-quality project-level code corpus and employ a fill-in-the-blank task with a 16K window to enhance code generation and infilling. Our extensive evaluations demonstrate that DeepSeek-Coder not only achieves state-of-the-art performance among open-source code models across multiple benchmarks but also surpasses existing closed-source models like Codex and GPT-3.5. Furthermore, DeepSeek-Coder models are under a permissive license that allows for both research and unrestricted commercial use.
PDF664December 15, 2024