ChatPaper.aiChatPaper

DeepSeek-Coder:当大型语言模型遇上编程 —— 代码智能的崛起

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

January 25, 2024
作者: Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, Wenfeng Liang
cs.AI

摘要

大语言模型的快速发展彻底改变了软件开发中的代码智能。然而,封闭源模型的主导地位限制了广泛的研究和开发。为了解决这一问题,我们推出了DeepSeek-Coder系列,这是一系列开源代码模型,规模从13亿到330亿不等,从头开始训练,使用了2万亿标记。这些模型在高质量项目级代码语料库上进行了预训练,并采用了填空任务和16K窗口,以增强代码生成和填充。我们的广泛评估表明,DeepSeek-Coder不仅在多个基准测试中取得了开源代码模型的最新性能,而且超越了现有的Codex和GPT-3.5等封闭源模型。此外,DeepSeek-Coder模型采用宽松许可证,允许进行研究和无限制的商业使用。
English
The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. These models are pre-trained on a high-quality project-level code corpus and employ a fill-in-the-blank task with a 16K window to enhance code generation and infilling. Our extensive evaluations demonstrate that DeepSeek-Coder not only achieves state-of-the-art performance among open-source code models across multiple benchmarks but also surpasses existing closed-source models like Codex and GPT-3.5. Furthermore, DeepSeek-Coder models are under a permissive license that allows for both research and unrestricted commercial use.
PDF634December 15, 2024