閉合循環:利用RPG編碼器實現通用儲存庫表徵 注:RPG在此語境中應理解為Repository-Prompt-Graph(儲存庫-提示-圖譜)的縮寫,這是原文標題中創新的技術術語組合。譯文通過「編碼器」準確對應Encoder的技術含義,同時採用「表徵」這一機器學習領域標準術語來翻譯Representation,既保持學術嚴謹性又符合中文表達習慣。標題結構通過冒號分隔主副標題,完整再現原文的技術內涵與修辭效果。
Closing the Loop: Universal Repository Representation with RPG-Encoder
February 2, 2026
作者: Jane Luo, Chengyu Yin, Xin Zhang, Qingtao Li, Steven Liu, Yiming Huang, Jie Wu, Hao Liu, Yangyu Huang, Yu Kang, Fangkai Yang, Ying Xin, Scarlett Li
cs.AI
摘要
當前程式庫代理面臨推理斷層的問題,根源在於表徵的碎片化——現有方法依賴於孤立的API文檔或缺乏語義深度的依賴圖。我們將程式庫理解與生成視為統一循環中的逆過程:生成將意圖擴展為實現,而理解則將實現壓縮回意圖。為此,我們提出RPG-Encoder框架,將靜態生成藍圖的儲存庫規劃圖(RPG)泛化為統一的高保真表徵。該框架通過三重機制閉合推理迴路:(1)將原始代碼編碼為融合語義特徵與依賴關係的RPG;(2)以增量式拓撲演化實現維護成本與程式庫規模解耦,降低95.7%開銷;(3)作為結構感知導航的統一接口。在評估中,RPG-Encoder在SWE-bench Verified上以93.7%的Acc@5刷新程式庫理解最佳表現,並在SWE-bench Live Lite上超越基線模型10%以上,彰顯了其在複雜程式庫中優越的細粒度定位能力。此外,其在RepoCraft上實現98.5%的重建覆蓋率,證實RPG具備鏡像原始程式庫的高保真能力,徹底閉合了意圖與實現之間的迴路。
English
Current repository agents encounter a reasoning disconnect due to fragmented representations, as existing methods rely on isolated API documentation or dependency graphs that lack semantic depth. We consider repository comprehension and generation to be inverse processes within a unified cycle: generation expands intent into implementation, while comprehension compresses implementation back into intent. To address this, we propose RPG-Encoder, a framework that generalizes the Repository Planning Graph (RPG) from a static generative blueprint into a unified, high-fidelity representation. RPG-Encoder closes the reasoning loop through three mechanisms: (1) Encoding raw code into the RPG that combines lifted semantic features with code dependencies; (2) Evolving the topology incrementally to decouple maintenance costs from repository scale, reducing overhead by 95.7%; and (3) Operating as a unified interface for structure-aware navigation. In evaluations, RPG-Encoder establishes state-of-the-art repository understanding on SWE-bench Verified with 93.7% Acc@5 and exceeds the best baseline by over 10% on SWE-bench Live Lite. These results highlight our superior fine-grained localization accuracy in complex codebases. Furthermore, it achieves 98.5% reconstruction coverage on RepoCraft, confirming RPG's high-fidelity capacity to mirror the original codebase and closing the loop between intent and implementation.