Code2LoRA：超网络生成的適配器，應用於軟體演化下的程式碼語言模型

摘要

程式碼語言模型需要倉庫層級的上下文來解析匯入、API 及專案慣例。現有方法將這些知識當作長輸入（透過 RAG 或依賴分析檢索），或透過每個倉庫的微調與 LoRA（低秩適應）來注入——這在規模上成本高昂，且對於不斷演進的程式庫而言較為脆弱。我們提出 Code2LoRA，一個能生成倉庫專屬 LoRA 適配器的超網路架構，有效注入倉庫知識且無須在推論時付出 token 開銷。Code2LoRA 支援兩種使用情境：Code2LoRA-Static 將單一倉庫快照轉換為適配器，適合理解穩定的程式庫；而 Code2LoRA-Evo 則維護一個由 GRU 隱藏狀態支援的適配器，該狀態會隨每次程式碼差異（diff）更新，適合活躍開發中的演進程式庫。為了在參數高效微調的基準上評估 Code2LoRA，我們建構了 RepoPeftBench，一個包含 604 個 Python 倉庫的基準，擁有兩個軌道：靜態軌道包含 40K 訓練與 12K 測試的斷言補全任務；演進軌道則包含從提交（commit）衍生的 215K 訓練與 87K 測試任務。在靜態軌道上，Code2LoRA-Static 達到了 63.8% 的跨倉庫與 66.2% 的倉庫內精確匹配，與每個倉庫的 LoRA 上界相當；在演進軌道上，Code2LoRA-Evo 達到了 60.3% 的跨倉庫精確匹配（比單一共用 LoRA 高出 5.2 個百分點）。Code2LoRA 的程式碼可在 https://anonymous.4open.science/r/code2lora-6857 找到；模型檢查點與 RepoPeftBench 資料集可在 https://huggingface.co/code2lora 取得。

English

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork framework that generates repository-specific LoRA adapters, effectively injecting repository knowledge with zero inference-time token overhead. Code2LoRA supports two usage scenarios: Code2LoRA-Static converts a single repository snapshot into an adapter, suitable for comprehension of stable codebases; while Code2LoRA-Evo maintains an adapter backed by a GRU hidden state updated per code diff, suitable for active development of evolving codebases. To evaluate Code2LoRA against parameter-efficient fine-tuning baselines, we build RepoPeftBench, a benchmark of 604 Python repositories with two tracks: a static track with 40K training and 12K test assertion-completion tasks, and an evolution track with 215K commit-derived training and 87K commit-derived test tasks. On the static track, Code2LoRA-Static achieves 63.8% cross-repo and 66.2% in-repo exact match, matching the per-repository LoRA upper bound; on the evolution track, Code2LoRA-Evo achieves 60.3% cross-repo exact match (+5.2 pp over a single shared LoRA). Code2LoRA's code can be found at https://anonymous.4open.science/r/code2lora-6857; the model checkpoints and RepoPeftBench datasets can be found at https://huggingface.co/code2lora.