MutaGReP：基於程式庫的免執行計畫搜尋技術於程式碼使用

摘要

當人類要求大型語言模型（LLM）利用大型程式碼庫中的功能來完成編碼任務時，我們該如何從程式庫中提供上下文給LLM？一種方法是將整個程式庫加入LLM的上下文視窗。然而，大多數任務僅涉及程式庫中的一小部分符號，過長的上下文會損害LLM的推理能力，且上下文視窗並非無限。另一種方法是模擬人類在大型程式庫中導航、挑選正確功能並制定解決任務計劃的能力。我們提出了MutaGReP（變異引導的程式庫計劃搜索），這是一種搜索計劃的方法，將用戶請求分解為基於程式庫的自然語言步驟。MutaGReP在計劃空間中進行神經樹搜索，通過變異計劃並使用符號檢索器進行基礎探索。在具有挑戰性的LongCodeArena基準測試中，我們的計劃僅使用了GPT-4o 128K上下文視窗的不到5%，但其編碼性能卻與填滿程式庫上下文的GPT-4o相當。MutaGReP生成的計劃使Qwen 2.5 Coder 32B和72B能夠與具有完整程式庫上下文的GPT-4o性能相匹配，並在最困難的LongCodeArena任務上取得進展。項目頁面：zaidkhan.me/MutaGReP

English

When a human requests an LLM to complete a coding task using functionality from a large code repository, how do we provide context from the repo to the LLM? One approach is to add the entire repo to the LLM's context window. However, most tasks involve only fraction of symbols from a repo, longer contexts are detrimental to the LLM's reasoning abilities, and context windows are not unlimited. Alternatively, we could emulate the human ability to navigate a large repo, pick out the right functionality, and form a plan to solve the task. We propose MutaGReP (Mutation-guided Grounded Repository Plan Search), an approach to search for plans that decompose a user request into natural language steps grounded in the codebase. MutaGReP performs neural tree search in plan space, exploring by mutating plans and using a symbol retriever for grounding. On the challenging LongCodeArena benchmark, our plans use less than 5% of the 128K context window for GPT-4o but rival the coding performance of GPT-4o with a context window filled with the repo. Plans produced by MutaGReP allow Qwen 2.5 Coder 32B and 72B to match the performance of GPT-4o with full repo context and enable progress on the hardest LongCodeArena tasks. Project page: zaidkhan.me/MutaGReP

MutaGReP：基於程式庫的免執行計畫搜尋技術於程式碼使用

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

摘要

Support