闭环构建:基于RPG编码器的通用仓库表征
Closing the Loop: Universal Repository Representation with RPG-Encoder
February 2, 2026
作者: Jane Luo, Chengyu Yin, Xin Zhang, Qingtao Li, Steven Liu, Yiming Huang, Jie Wu, Hao Liu, Yangyu Huang, Yu Kang, Fangkai Yang, Ying Xin, Scarlett Li
cs.AI
摘要
当前代码库智能体因表征碎片化而面临推理割裂问题,现有方法依赖孤立的API文档或缺乏语义深度的依赖图。我们将代码库理解与生成视为统一循环中的逆过程:生成将意图扩展为具体实现,而理解则将实现压缩回原始意图。为此,我们提出RPG-Encoder框架,将静态生成蓝图——代码库规划图(RPG)泛化为统一的高保真表征。该框架通过三重机制闭合推理循环:(1)将原始代码编码为融合语义特征与依赖关系的RPG;(2)通过增量式拓扑演化实现维护成本与代码库规模解耦,降低95.7%开销;(3)作为统一接口支持结构感知导航。在SWE-bench Verified评测中,RPG-Encoder以93.7%的Acc@5指标实现最先进的代码库理解能力,并在SWE-bench Live Lite上以超过最佳基线10%的优势领先。这些结果凸显了我们在复杂代码库中卓越的细粒度定位精度。此外,在RepoCraft数据集上达到98.5%的重建覆盖率,证实RPG具备镜像原始代码库的高保真能力,最终实现了意图与实现之间的闭环衔接。
English
Current repository agents encounter a reasoning disconnect due to fragmented representations, as existing methods rely on isolated API documentation or dependency graphs that lack semantic depth. We consider repository comprehension and generation to be inverse processes within a unified cycle: generation expands intent into implementation, while comprehension compresses implementation back into intent. To address this, we propose RPG-Encoder, a framework that generalizes the Repository Planning Graph (RPG) from a static generative blueprint into a unified, high-fidelity representation. RPG-Encoder closes the reasoning loop through three mechanisms: (1) Encoding raw code into the RPG that combines lifted semantic features with code dependencies; (2) Evolving the topology incrementally to decouple maintenance costs from repository scale, reducing overhead by 95.7%; and (3) Operating as a unified interface for structure-aware navigation. In evaluations, RPG-Encoder establishes state-of-the-art repository understanding on SWE-bench Verified with 93.7% Acc@5 and exceeds the best baseline by over 10% on SWE-bench Live Lite. These results highlight our superior fine-grained localization accuracy in complex codebases. Furthermore, it achieves 98.5% reconstruction coverage on RepoCraft, confirming RPG's high-fidelity capacity to mirror the original codebase and closing the loop between intent and implementation.