ChatPaper.aiChatPaper

超越檢索:程式碼搜尋的多任務基準與模型

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

May 6, 2026
作者: Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu, Fan Zhou, Hang Yu
cs.AI

摘要

程式碼搜尋通常被評估為第一階段檢索,即便實際生產系統依賴於更廣泛的管線,包括重排序與開發者風格的查詢。現有基準亦存在資料污染、標籤雜訊及退化二元關聯性等問題。本文提出CoREB——一個限制污染的多任務程式碼檢索與重排序基準,並搭配一個微調的程式碼重排序器,其功能超越檢索範疇,涵蓋完整的程式碼搜尋管線。CoREB基於以反事實方式改寫的LiveCodeBench問題構建,涵蓋五種程式語言,並以定時發布的形式提供分級相關性判斷。我們針對三項任務(文字轉程式碼、程式碼轉文字、程式碼轉程式碼)對十一種嵌入模型與五種重排序器進行基準測試。實驗揭示:一、程式碼專用嵌入在程式碼轉程式碼檢索中佔據主導地位(效能約為通用編碼器的兩倍),但無單一模型能贏得所有三項任務;二、最接近真實開發者搜尋的簡短關鍵字查詢,導致所有模型的nDCG@10近乎歸零;三、現成重排序器具有任務不對稱性,在程式碼轉程式碼任務上出現12個百分點的波動,且無任何基準模型在所有任務上實現淨正向收益;四、我們微調的CoREB-Reranker是首個在所有三項任務上均取得穩定提升的模型。資料與模型已公開釋出。
English
Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce CoREB, a contamination-limited, multitask code retrieval and reranking benchmark, together with a fine-tuned code reranker, that goes beyond retrieval to cover the full code search pipeline. CoREB is built from counterfactually rewritten LiveCodeBench problems in five programming languages and delivered as timed releases with graded relevance judgments. We benchmark eleven embedding models and five rerankers across three tasks: text-to-code, code-to-text, and code-to-code. Our experiments reveal that: \circone code-specialised embeddings dominate code-to-code retrieval ({sim}2{times} over general encoders), yet no single model wins all three tasks; \circtwo short keyword queries, the format closest to real developer search, collapse every model to near-zero nDCG@10; \circthree off-the-shelf rerankers are task-asymmetric, with a 12-point swing on code-to-code and no baseline net-positive across all tasks; \circfour our fine-tuned CoREB-Reranker is the first to achieve consistent gains across all three tasks. The data and model are released.
PDF221May 12, 2026