DeepCode:开放智能体编程平台
DeepCode: Open Agentic Coding
December 8, 2025
作者: Zongwei Li, Zhonghang Li, Zirui Guo, Xubin Ren, Chao Huang
cs.AI
摘要
近期大语言模型(LLM)的突破催生了强大的代码智能体,使得代码助手有望升级为代码工程师。然而,现有方法在实现高保真度的文档到代码库合成(如科研论文到代码的转换)时仍面临重大挑战,这主要源于信息过载与LLM上下文瓶颈之间的根本性矛盾。本文提出DeepCode——一种通过原则性信息流管理从根本上解决该挑战的全自主框架。通过将代码库合成建模为信道优化问题,DeepCode在有限上下文预算下无缝协调四大信息操作以最大化任务相关信号:基于蓝图提炼的源码压缩、采用状态化代码记忆的结构化索引、通过检索增强生成的条件知识注入,以及闭环纠错机制。在PaperBench基准上的大量实验表明,DeepCode实现了最先进的性能,显著超越Cursor、Claude Code等主流商业智能体,更关键的是,在核心复现指标上超越了顶尖机构的博士级人类专家。通过系统化地将论文规范转化为媲美人类专家水准的生产级实现,本工作为自主科研复现奠定了新基础,有望加速研究评估与科学发现进程。
English
Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face significant challenges in achieving high-fidelity document-to-codebase synthesis--such as scientific papers to code--primarily due to a fundamental conflict between information overload and the context bottlenecks of LLMs. In this work, we introduce DeepCode, a fully autonomous framework that fundamentally addresses this challenge through principled information-flow management. By treating repository synthesis as a channel optimization problem, DeepCode seamlessly orchestrates four information operations to maximize task-relevant signals under finite context budgets: source compression via blueprint distillation, structured indexing using stateful code memory, conditional knowledge injection via retrieval-augmented generation, and closed-loop error correction. Extensive evaluations on the PaperBench benchmark demonstrate that DeepCode achieves state-of-the-art performance, decisively outperforming leading commercial agents such as Cursor and Claude Code, and crucially, surpassing PhD-level human experts from top institutes on key reproduction metrics. By systematically transforming paper specifications into production-grade implementations comparable to human expert quality, this work establishes new foundations for autonomous scientific reproduction that can accelerate research evaluation and discovery.