基于注意力机制的相关性评分增强阿拉伯语文本检索
Enhanced Arabic Text Retrieval with Attentive Relevance Scoring
July 31, 2025
作者: Salah Eddine Bekhouche, Azeddine Benlamoudi, Yazid Bounab, Fadi Dornaika, Abdenour Hadid
cs.AI
摘要
阿拉伯语因其复杂的形态结构、可选的变音符号以及现代标准阿拉伯语(MSA)与多种方言并存的特点,对自然语言处理(NLP)和信息检索(IR)构成了独特挑战。尽管阿拉伯语在全球的重要性日益增长,但在NLP研究和基准资源中仍显不足。本文提出了一种专为阿拉伯语设计的增强型密集段落检索(DPR)框架。该框架的核心是一种新颖的注意力相关度评分(ARS)机制,它通过自适应评分函数替代标准交互机制,更有效地建模问题与段落之间的语义相关性。我们的方法整合了预训练的阿拉伯语语言模型和架构优化,显著提升了检索性能,并在回答阿拉伯语问题时大幅提高了排序准确性。相关代码已公开于https://github.com/Bekhouche/APR{GitHub}。
English
Arabic poses a particular challenge for natural language processing (NLP) and
information retrieval (IR) due to its complex morphology, optional diacritics
and the coexistence of Modern Standard Arabic (MSA) and various dialects.
Despite the growing global significance of Arabic, it is still underrepresented
in NLP research and benchmark resources. In this paper, we present an enhanced
Dense Passage Retrieval (DPR) framework developed specifically for Arabic. At
the core of our approach is a novel Attentive Relevance Scoring (ARS) that
replaces standard interaction mechanisms with an adaptive scoring function that
more effectively models the semantic relevance between questions and passages.
Our method integrates pre-trained Arabic language models and architectural
refinements to improve retrieval performance and significantly increase ranking
accuracy when answering Arabic questions. The code is made publicly available
at https://github.com/Bekhouche/APR{GitHub}.