풀스택 에이전트: 개발 지향 테스팅 및 저장소 역번역을 통한 에이전트 기반 풀스택 웹 코딩 향상

초록

비전문가 사용자가 복잡한 상호작용형 웹사이트를 개발하도록 지원하는 것은 LLM 기반 코드 에이전트의 인기 과제가 되었습니다. 그러나 기존 코드 에이전트는 화려한 시각 효과로 실제 풀스택 데이터 처리 및 저장 기능의 부재를 가린 채 프론트엔드 웹 페이지 생성에만 그치는 경향이 있습니다. 특히 프로덕션 수준의 풀스택 웹 애플리케이션 구축은 프론트엔드 웹 페이지 생성보다 훨씬 어려운 과제로, 데이터 흐름의 세심한 제어, 지속적으로 업데이트되는 패키지와 의존성에 대한 포괄적인 이해, 코드베이스 내 난해한 버그의 정확한 위치 파악을 요구합니다. 이러한 어려움을 해결하기 위해 우리는 세 가지 구성 요소로 이루어진 통합 풀스택 에이전트 코딩 시스템인 FullStack-Agent를 소개합니다: (1) 강력한 기획, 코드 편집, 코드베이스 탐색 및 버그 위치 파악 능력을 갖춘 다중 에이전트 프레임워크인 FullStack-Dev. (2) 크롤링 및 합성된 웹사이트 저장소를 역번역하여 FullStack-Dev의 백본 LLM 성능을 향상시키는 혁신적인 데이터 스케일링 및 자기 개선 방법인 FullStack-Learn. (3) 생성된 웹사이트의 프론트엔드, 백엔드, 데이터베이스 기능을 체계적으로 테스트하는 포괄적인 벤치마크인 FullStack-Bench. 우리의 FullStack-Dev는 프론트엔드, 백엔드, 데이터베이스 테스트 케이스에서 각각 기존 최첨단 방법 대비 8.7%, 38.2%, 15.9% 더 높은 성능을 보였습니다. 또한 FullStack-Learn은 자기 개선을 통해 30B 모델의 성능을 세 가지 테스트 케이스 세트에서 각각 9.7%, 9.5%, 2.8% 향상시켜 우리 접근법의 효과를 입증했습니다. 코드는 https://github.com/mnluzimu/FullStack-Agent에서 공개되었습니다.

English

Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pages, masking the lack of real full-stack data processing and storage with fancy visual effects. Notably, constructing production-level full-stack web applications is far more challenging than only generating frontend web pages, demanding careful control of data flow, comprehensive understanding of constantly updating packages and dependencies, and accurate localization of obscure bugs in the codebase. To address these difficulties, we introduce FullStack-Agent, a unified agent system for full-stack agentic coding that consists of three parts: (1) FullStack-Dev, a multi-agent framework with strong planning, code editing, codebase navigation, and bug localization abilities. (2) FullStack-Learn, an innovative data-scaling and self-improving method that back-translates crawled and synthesized website repositories to improve the backbone LLM of FullStack-Dev. (3) FullStack-Bench, a comprehensive benchmark that systematically tests the frontend, backend and database functionalities of the generated website. Our FullStack-Dev outperforms the previous state-of-the-art method by 8.7%, 38.2%, and 15.9% on the frontend, backend, and database test cases respectively. Additionally, FullStack-Learn raises the performance of a 30B model by 9.7%, 9.5%, and 2.8% on the three sets of test cases through self-improvement, demonstrating the effectiveness of our approach. The code is released at https://github.com/mnluzimu/FullStack-Agent.

풀스택 에이전트: 개발 지향 테스팅 및 저장소 역번역을 통한 에이전트 기반 풀스택 웹 코딩 향상

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

초록

Support