ChatPaper.aiChatPaper
arXiv: 2606.06492v1

Code2LoRA:超网络生成的適配器,應用於軟體演化下的程式碼語言模型

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

June 4, 2026
作者: Liliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie
cs.SEcs.SEcs.AIcs.CLcs.SE

摘要

程式碼語言模型需要倉庫層級的上下文來解析匯入、API 及專案慣例。現有方法將這些知識當作長輸入(透過 RAG 或依賴分析檢索),或透過每個倉庫的微調與 LoRA(低秩適應)來注入——這在規模上成本高昂,且對於不斷演進的程式庫而言較為脆弱。我們提出 Code2LoRA,一個能生成倉庫專屬 LoRA 適配器的超網路架構,有效注入倉庫知識且無須在推論時付出 token 開銷。Code2LoRA 支援兩種使用情境:Code2LoRA-Static 將單一倉庫快照轉換為適配器,適合理解穩定的程式庫;而 Code2LoRA-Evo 則維護一個由 GRU 隱藏狀態支援的適配器,該狀態會隨每次程式碼差異(diff)更新,適合活躍開發中的演進程式庫。為了在參數高效微調的基準上評估 Code2LoRA,我們建構了 RepoPeftBench,一個包含 604 個 Python 倉庫的基準,擁有兩個軌道:靜態軌道包含 40K 訓練與 12K 測試的斷言補全任務;演進軌道則包含從提交(commit)衍生的 215K 訓練與 87K 測試任務。在靜態軌道上,Code2LoRA-Static 達到了 63.8% 的跨倉庫與 66.2% 的倉庫內精確匹配,與每個倉庫的 LoRA 上界相當;在演進軌道上,Code2LoRA-Evo 達到了 60.3% 的跨倉庫精確匹配(比單一共用 LoRA 高出 5.2 個百分點)。Code2LoRA 的程式碼可在 https://anonymous.4open.science/r/code2lora-6857 找到;模型檢查點與 RepoPeftBench 資料集可在 https://huggingface.co/code2lora 取得。
English
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork framework that generates repository-specific LoRA adapters, effectively injecting repository knowledge with zero inference-time token overhead. Code2LoRA supports two usage scenarios: Code2LoRA-Static converts a single repository snapshot into an adapter, suitable for comprehension of stable codebases; while Code2LoRA-Evo maintains an adapter backed by a GRU hidden state updated per code diff, suitable for active development of evolving codebases. To evaluate Code2LoRA against parameter-efficient fine-tuning baselines, we build RepoPeftBench, a benchmark of 604 Python repositories with two tracks: a static track with 40K training and 12K test assertion-completion tasks, and an evolution track with 215K commit-derived training and 87K commit-derived test tasks. On the static track, Code2LoRA-Static achieves 63.8% cross-repo and 66.2% in-repo exact match, matching the per-repository LoRA upper bound; on the evolution track, Code2LoRA-Evo achieves 60.3% cross-repo exact match (+5.2 pp over a single shared LoRA). Code2LoRA's code can be found at https://anonymous.4open.science/r/code2lora-6857; the model checkpoints and RepoPeftBench datasets can be found at https://huggingface.co/code2lora.