DeepCode:开放智能体编程系统
DeepCode: Open Agentic Coding
December 8, 2025
作者: Zongwei Li, Zhonghang Li, Zirui Guo, Xubin Ren, Chao Huang
cs.AI
摘要
大型語言模型(LLM)的最新進展催生了強大的編程智能體,使得代碼助手有望升級為代碼工程師。然而,現有方法在實現高保真度的文檔到代碼庫合成(例如從科學論文生成代碼)時仍面臨重大挑戰,這主要源於信息過載與LLM上下文瓶頸之間的根本性矛盾。本研究提出DeepCode——一個通過原則性信息流管理從本質上解決該挑戰的全自主框架。通過將代碼庫合成建模為信道優化問題,DeepCode在有限上下文預算下無縫協調四項信息操作以最大化任務相關信號:基於藍圖蒸餾的源文件壓縮、採用狀態化代碼記憶體的結構化索引、通過檢索增強生成實現條件性知識注入,以及閉環錯誤校正。在PaperBench基準上的廣泛評估表明,DeepCode實現了最先進的性能,不僅顯著超越Cursor和Claude Code等主流商業智能體,更關鍵的是在多項關鍵復現指標上超越了頂尖機構的博士級人類專家。通過系統性地將論文規格轉化為可與人類專家質量相媲美的生產級實現,本研究為自主科學復現奠定了新基礎,有望加速科研評估與發現進程。
English
Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face significant challenges in achieving high-fidelity document-to-codebase synthesis--such as scientific papers to code--primarily due to a fundamental conflict between information overload and the context bottlenecks of LLMs. In this work, we introduce DeepCode, a fully autonomous framework that fundamentally addresses this challenge through principled information-flow management. By treating repository synthesis as a channel optimization problem, DeepCode seamlessly orchestrates four information operations to maximize task-relevant signals under finite context budgets: source compression via blueprint distillation, structured indexing using stateful code memory, conditional knowledge injection via retrieval-augmented generation, and closed-loop error correction. Extensive evaluations on the PaperBench benchmark demonstrate that DeepCode achieves state-of-the-art performance, decisively outperforming leading commercial agents such as Cursor and Claude Code, and crucially, surpassing PhD-level human experts from top institutes on key reproduction metrics. By systematically transforming paper specifications into production-grade implementations comparable to human expert quality, this work establishes new foundations for autonomous scientific reproduction that can accelerate research evaluation and discovery.