MLAIRE:多語言語言感知資訊檢索評估協議
MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal
May 8, 2026
作者: Youngjoon Jang, Seongtae Hong, Hyeonseok Moon, Heuiseok Lim
cs.AI
摘要
多語言資訊檢索(Multilingual Information Retrieval)在真實世界的搜尋情境中日趨重要,使用者常在混合語言的語料庫中提出查詢。現有評測主要獎勵與語言無關的語義相關性,將相關段落視為同等,無論其語言為何。然而,檢索的實用性亦取決於檢索段落的語言:使用者可能偏好能以查詢語言閱讀及驗證的結果;而查詢與段落間的語言不匹配,則可能使檢索增強生成系統中的下游實體化與答案驗證複雜化。為評估此語言感知維度,我們提出 MLAIRE(多語言語言感知資訊檢索評測協定),該協定能釐清跨語言語義檢索與查詢語言偏好之間的關係。MLAIRE 建構包含多語言平行段落的受控池,從而能在提供等效翻譯時,分別量測語義檢索準確度與查詢語言偏好。我們提出語言感知指標,包括語言偏好率(LPR)及 Lang-nDCG,並搭配四向分解法,將語義檢索失敗與查詢語言偏好失敗區分開來。透過評估 31 種密集式、稀疏式及晚期交互檢索器,我們顯示標準指標掩蓋了不同行為:語義表現強的檢索器可能以非查詢語言回傳正確內容,而查詢語言偏好較強的檢索器則可能檢索到語義相關性較低的段落。
English
Multilingual Information Retrieval is increasingly important in real-world search settings, where users issue queries over mixed-language corpora. Existing evaluations mainly reward language-agnostic semantic relevance, treating relevant passages equally regardless of language. Yet retrieval utility also depends on the language of the retrieved passages: users may prefer results they can read and verify in the query language, and query--passage language mismatch can complicate downstream grounding and answer verification in Retrieval-Augmented Generation systems. To evaluate this language-aware dimension, we introduce MLAIRE, a Multilingual Language-Aware Information Retrieval Evaluation protocol that disentangles cross-lingual semantic retrieval from query-language preference. MLAIRE constructs controlled pools with parallel passages across languages, enabling measurement of semantic retrieval accuracy and query-language preference when equivalent translations are available. We propose language-aware metrics, including Language Preference Rate (LPR) and Lang-nDCG, together with a 4-way decomposition separating semantic and query-language preference failures. Evaluating 31 dense, sparse, and late-interaction retrievers, we show that standard metrics obscure distinct behaviors: semantically strong retrievers may return correct content in a non-query language, while retrievers with stronger query-language preference may retrieve less semantically relevant passages.