RPG: 통합적 및 확장 가능한 코드베이스 생성을 위한 리포지토리 계획 그래프

초록

대규모 언어 모델은 함수 및 파일 수준의 코드 생성에서 뛰어난 성능을 보이지만, 처음부터 완전한 저장소를 생성하는 것은 여전히 근본적인 과제로 남아 있습니다. 이 과정은 제안 및 구현 단계에 걸쳐 일관되고 신뢰할 수 있는 계획을 요구하며, 자연어는 모호성과 장황함으로 인해 복잡한 소프트웨어 구조를 충실히 표현하기에는 적합하지 않습니다. 이를 해결하기 위해, 우리는 저장소 계획 그래프(Repository Planning Graph, RPG)를 도입합니다. RPG는 기능, 파일 구조, 데이터 흐름 및 함수를 하나의 그래프로 통합하여 제안 및 구현 단계의 계획을 통합하는 지속적인 표현입니다. RPG는 모호한 자연어를 명시적인 청사진으로 대체하여 장기적인 계획과 확장 가능한 저장소 생성을 가능하게 합니다. RPG를 기반으로, 우리는 처음부터 저장소 생성을 위한 그래프 기반 프레임워크인 ZeroRepo를 개발했습니다. ZeroRepo는 그래프를 구성하기 위한 제안 수준 계획 및 구현 수준 정제의 세 단계로 작동하며, 이후 그래프 기반 코드 생성과 테스트 검증을 수행합니다. 이 설정을 평가하기 위해, 우리는 1,052개의 작업으로 구성된 6개의 실제 프로젝트를 포함한 벤치마크인 RepoCraft를 구축했습니다. RepoCraft에서 ZeroRepo는 평균 약 36K LOC(Line of Code)의 저장소를 생성하며, 이는 가장 강력한 베이스라인(Claude Code)보다 약 3.9배, 다른 베이스라인보다 약 64배 더 많은 양입니다. ZeroRepo는 81.5%의 기능적 커버리지와 69.7%의 통과율을 달성하여 Claude Code를 각각 27.3%와 35.8% 포인트 앞섭니다. 추가 분석은 RPG가 복잡한 의존성을 모델링하고, 거의 선형적인 확장을 통해 점점 더 정교한 계획을 가능하게 하며, LLM이 저장소를 이해하는 능력을 향상시켜 에이전트 현지화를 가속화한다는 것을 보여줍니다.

English

Large language models excel at function- and file-level code generation, yet generating complete repositories from scratch remains a fundamental challenge. This process demands coherent and reliable planning across proposal- and implementation-level stages, while natural language, due to its ambiguity and verbosity, is ill-suited for faithfully representing complex software structures. To address this, we introduce the Repository Planning Graph (RPG), a persistent representation that unifies proposal- and implementation-level planning by encoding capabilities, file structures, data flows, and functions in one graph. RPG replaces ambiguous natural language with an explicit blueprint, enabling long-horizon planning and scalable repository generation. Building on RPG, we develop ZeroRepo, a graph-driven framework for repository generation from scratch. It operates in three stages: proposal-level planning and implementation-level refinement to construct the graph, followed by graph-guided code generation with test validation. To evaluate this setting, we construct RepoCraft, a benchmark of six real-world projects with 1,052 tasks. On RepoCraft, ZeroRepo produces repositories averaging nearly 36K LOC, roughly 3.9times the strongest baseline (Claude Code) and about 64times other baselines. It attains 81.5% functional coverage and a 69.7% pass rate, exceeding Claude Code by 27.3 and 35.8 percentage points, respectively. Further analysis shows that RPG models complex dependencies, enables progressively more sophisticated planning through near-linear scaling, and enhances LLM understanding of repositories, thereby accelerating agent localization.

RPG: 통합적 및 확장 가능한 코드베이스 생성을 위한 리포지토리 계획 그래프

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

초록

Support