全栈智能体:通过面向开发的测试与代码库回译增强全栈Web编程的代理能力
FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation
February 3, 2026
作者: Zimu Lu, Houxing Ren, Yunqiao Yang, Ke Wang, Zhuofan Zong, Mingjie Zhan, Hongsheng Li
cs.AI
摘要
协助非专业用户开发复杂交互式网站已成为LLM驱动代码智能体的热门任务。然而现有代码智能体往往仅能生成前端网页,用花哨的视觉效果掩盖了真实全栈数据处理与存储能力的缺失。值得注意的是,构建生产级全栈网络应用远比单纯生成前端页面更具挑战性,需要精细控制数据流、全面理解持续更新的软件包与依赖关系,并准确定位代码库中的隐蔽错误。为应对这些难题,我们提出FullStack-Agent——一个统一的全栈智能编码系统,包含三个核心部分:(1)FullStack-Dev:具备强大概划能力、代码编辑、代码库导航及错误定位功能的多智能体框架;(2)FullStack-Learn:创新的数据扩展与自改进方法,通过对爬取及合成的网站仓库进行回译训练来提升FullStack-Dev主干LLM的性能;(3)FullStack-Bench:系统性测试生成网站前端、后端及数据库功能的综合基准。实验表明,FullStack-Dev在前端、后端和数据库测试用例上分别以8.7%、38.2%和15.9%的优势超越此前最优方法。此外,FullStack-Learn通过自改进使30B模型在三类测试用例上的性能分别提升9.7%、9.5%和2.8%,证明了我们方法的有效性。代码已发布于https://github.com/mnluzimu/FullStack-Agent。
English
Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pages, masking the lack of real full-stack data processing and storage with fancy visual effects. Notably, constructing production-level full-stack web applications is far more challenging than only generating frontend web pages, demanding careful control of data flow, comprehensive understanding of constantly updating packages and dependencies, and accurate localization of obscure bugs in the codebase. To address these difficulties, we introduce FullStack-Agent, a unified agent system for full-stack agentic coding that consists of three parts: (1) FullStack-Dev, a multi-agent framework with strong planning, code editing, codebase navigation, and bug localization abilities. (2) FullStack-Learn, an innovative data-scaling and self-improving method that back-translates crawled and synthesized website repositories to improve the backbone LLM of FullStack-Dev. (3) FullStack-Bench, a comprehensive benchmark that systematically tests the frontend, backend and database functionalities of the generated website. Our FullStack-Dev outperforms the previous state-of-the-art method by 8.7%, 38.2%, and 15.9% on the frontend, backend, and database test cases respectively. Additionally, FullStack-Learn raises the performance of a 30B model by 9.7%, 9.5%, and 2.8% on the three sets of test cases through self-improvement, demonstrating the effectiveness of our approach. The code is released at https://github.com/mnluzimu/FullStack-Agent.