RPG: 統一かつスケーラブルなコードベース生成のためのリポジトリ計画グラフ

要旨

大規模言語モデルは関数レベルやファイルレベルのコード生成において優れた性能を発揮するが、ゼロから完全なリポジトリを生成することは依然として根本的な課題である。このプロセスでは、提案レベルと実装レベルにわたる一貫性のある信頼性の高い計画が求められるが、自然言語はその曖昧さと冗長性のため、複雑なソフトウェア構造を忠実に表現するには不向きである。この問題に対処するため、我々はリポジトリ計画グラフ（Repository Planning Graph, RPG）を提案する。RPGは、提案レベルと実装レベルの計画を統合し、機能、ファイル構造、データフロー、および関数を1つのグラフにエンコードする永続的な表現である。RPGは曖昧な自然言語を明示的な設計図に置き換え、長期的な計画とスケーラブルなリポジトリ生成を可能にする。RPGを基盤として、ゼロからのリポジトリ生成のためのグラフ駆動型フレームワークであるZeroRepoを開発した。ZeroRepoは3つの段階で動作する：提案レベルの計画と実装レベルの洗練によりグラフを構築し、その後、グラフに基づいたコード生成とテスト検証を行う。この設定を評価するため、1,052のタスクを含む6つの実世界プロジェクトからなるベンチマークRepoCraftを構築した。RepoCraftにおいて、ZeroRepoは平均約36K LOCのリポジトリを生成し、最強のベースライン（Claude Code）の約3.9倍、他のベースラインの約64倍に相当する。ZeroRepoは81.5%の機能カバレッジと69.7%のパス率を達成し、Claude Codeをそれぞれ27.3ポイントと35.8ポイント上回った。さらに分析により、RPGが複雑な依存関係をモデル化し、ほぼ線形スケーリングを通じて段階的に洗練された計画を可能にし、LLMのリポジトリ理解を向上させ、エージェントのローカライゼーションを加速することが示された。

English

Large language models excel at function- and file-level code generation, yet generating complete repositories from scratch remains a fundamental challenge. This process demands coherent and reliable planning across proposal- and implementation-level stages, while natural language, due to its ambiguity and verbosity, is ill-suited for faithfully representing complex software structures. To address this, we introduce the Repository Planning Graph (RPG), a persistent representation that unifies proposal- and implementation-level planning by encoding capabilities, file structures, data flows, and functions in one graph. RPG replaces ambiguous natural language with an explicit blueprint, enabling long-horizon planning and scalable repository generation. Building on RPG, we develop ZeroRepo, a graph-driven framework for repository generation from scratch. It operates in three stages: proposal-level planning and implementation-level refinement to construct the graph, followed by graph-guided code generation with test validation. To evaluate this setting, we construct RepoCraft, a benchmark of six real-world projects with 1,052 tasks. On RepoCraft, ZeroRepo produces repositories averaging nearly 36K LOC, roughly 3.9times the strongest baseline (Claude Code) and about 64times other baselines. It attains 81.5% functional coverage and a 69.7% pass rate, exceeding Claude Code by 27.3 and 35.8 percentage points, respectively. Further analysis shows that RPG models complex dependencies, enables progressively more sophisticated planning through near-linear scaling, and enhances LLM understanding of repositories, thereby accelerating agent localization.

RPG: 統一かつスケーラブルなコードベース生成のためのリポジトリ計画グラフ

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

要旨

Support