FullStack-Agent：開発指向型テストとリポジトリ逆翻訳によるエージェント型フルスタックWebコーディングの強化

要旨

非専門家ユーザーが複雑なインタラクティブなウェブサイトを開発することを支援することは、LLMを活用したコードエージェントにとって一般的なタスクとなっている。しかし、既存のコードエージェントはフロントエンドのウェブページを生成する傾向があり、派手な視覚効果で実際のフルスタックデータ処理とストレージの欠如を覆い隠している。特に、プロダクションレベルのフルスタックウェブアプリケーションを構築することは、フロントエンドのウェブページを生成するだけよりもはるかに困難であり、データフローの注意深い制御、絶えず更新されるパッケージと依存関係の包括的な理解、コードベース内の不明瞭なバグの正確な特定が要求される。これらの課題に対処するため、我々はFullStack-Agentを提案する。これはフルスタックのエージェント的コーディングのための統一されたエージェントシステムであり、以下の3つの部分で構成される。(1) 強力な計画立案、コード編集、コードベースナビゲーション、バグ特定能力を備えたマルチエージェントフレームワークであるFullStack-Dev。(2) クロールおよび合成されたウェブサイトリポジトリを逆翻訳することで、FullStack-Devの基盤LLMを改善する、革新的なデータスケーリングおよび自己改善手法であるFullStack-Learn。(3) 生成されたウェブサイトのフロントエンド、バックエンド、データベース機能を体系的にテストする包括的なベンチマークであるFullStack-Bench。我々のFullStack-Devは、フロントエンド、バックエンド、データベースのテストケースにおいて、従来の最先端手法をそれぞれ8.7%、38.2%、15.9%上回った。さらに、FullStack-Learnは、自己改善を通じて30Bモデルの性能を3つのテストケースセットでそれぞれ9.7%、9.5%、2.8%向上させ、本手法の有効性を実証している。コードはhttps://github.com/mnluzimu/FullStack-Agent で公開されている。

English

Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pages, masking the lack of real full-stack data processing and storage with fancy visual effects. Notably, constructing production-level full-stack web applications is far more challenging than only generating frontend web pages, demanding careful control of data flow, comprehensive understanding of constantly updating packages and dependencies, and accurate localization of obscure bugs in the codebase. To address these difficulties, we introduce FullStack-Agent, a unified agent system for full-stack agentic coding that consists of three parts: (1) FullStack-Dev, a multi-agent framework with strong planning, code editing, codebase navigation, and bug localization abilities. (2) FullStack-Learn, an innovative data-scaling and self-improving method that back-translates crawled and synthesized website repositories to improve the backbone LLM of FullStack-Dev. (3) FullStack-Bench, a comprehensive benchmark that systematically tests the frontend, backend and database functionalities of the generated website. Our FullStack-Dev outperforms the previous state-of-the-art method by 8.7%, 38.2%, and 15.9% on the frontend, backend, and database test cases respectively. Additionally, FullStack-Learn raises the performance of a 30B model by 9.7%, 9.5%, and 2.8% on the three sets of test cases through self-improvement, demonstrating the effectiveness of our approach. The code is released at https://github.com/mnluzimu/FullStack-Agent.

FullStack-Agent：開発指向型テストとリポジトリ逆翻訳によるエージェント型フルスタックWebコーディングの強化

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

要旨

Support