全栈智能体：通过开发导向测试与代码库回译增强全栈Web编程的代理能力

摘要

协助非专业用户开发复杂交互式网站已成为LLM驱动代码智能体的热门任务。然而现有代码智能体往往仅能生成前端网页，通过炫目的视觉效果掩盖其缺乏真实全栈数据处理与存储能力的缺陷。值得注意的是，构建生产级全栈网络应用的挑战远大于单纯生成前端页面，需要精细控制数据流、全面理解持续更新的软件包与依赖关系，并准确定位代码库中的隐蔽错误。为解决这些难题，我们推出FullStack-Agent——一个统一的全栈智能体编码系统，包含三个核心组件：（1）FullStack-Dev：具备强规划能力、代码编辑、代码库导航及错误定位功能的多智能体框架；（2）FullStack-Learn：创新的数据扩展与自我提升方法，通过反向翻译爬取及合成的网站资源库来优化FullStack-Dev的骨干大语言模型；（3）FullStack-Bench：系统性测试生成网站前端、后端及数据库功能的综合基准。我们的FullStack-Dev在前端、后端和数据库测试案例上分别以8.7%、38.2%和15.9%的优势超越此前最优方法。此外，FullStack-Learn通过自我提升使30B参数模型在三组测试案例上的性能分别提升9.7%、9.5%和2.8%，证明了我们方法的有效性。代码已发布于https://github.com/mnluzimu/FullStack-Agent。

English

Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pages, masking the lack of real full-stack data processing and storage with fancy visual effects. Notably, constructing production-level full-stack web applications is far more challenging than only generating frontend web pages, demanding careful control of data flow, comprehensive understanding of constantly updating packages and dependencies, and accurate localization of obscure bugs in the codebase. To address these difficulties, we introduce FullStack-Agent, a unified agent system for full-stack agentic coding that consists of three parts: (1) FullStack-Dev, a multi-agent framework with strong planning, code editing, codebase navigation, and bug localization abilities. (2) FullStack-Learn, an innovative data-scaling and self-improving method that back-translates crawled and synthesized website repositories to improve the backbone LLM of FullStack-Dev. (3) FullStack-Bench, a comprehensive benchmark that systematically tests the frontend, backend and database functionalities of the generated website. Our FullStack-Dev outperforms the previous state-of-the-art method by 8.7%, 38.2%, and 15.9% on the frontend, backend, and database test cases respectively. Additionally, FullStack-Learn raises the performance of a 30B model by 9.7%, 9.5%, and 2.8% on the three sets of test cases through self-improvement, demonstrating the effectiveness of our approach. The code is released at https://github.com/mnluzimu/FullStack-Agent.

全栈智能体：通过开发导向测试与代码库回译增强全栈Web编程的代理能力

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

摘要

Support