ChatPaper.aiChatPaper

了解何时融合:探究在法律领域中的非英语混合检索

Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

September 2, 2024
作者: Antoine Louis, Gijs van Dijck, Gerasimos Spanakis
cs.AI

摘要

混合搜索已经成为一种有效的策略,以抵消不同匹配范式的局限性,特别是在领域外上下文中,观察到了检索质量的显著提升。然而,现有研究主要集中在有限的一组检索方法上,在仅限于英语的领域通用数据集上进行评估。在这项工作中,我们研究了在法语领域中未被探索的领域内,评估了各种知名检索模型的混合搜索的有效性,同时评估了零样本和领域内情景。我们的研究结果显示,在零样本情境中,融合不同领域通用模型相对于使用独立模型,无论融合方法如何,都能持续提升性能。令人惊讶的是,当模型在领域内训练时,我们发现相对于使用最佳单一系统,融合通常会降低性能,除非使用经过精心调整权重的分数进行融合。这些新颖见解等扩展了先前发现的适用性,跨越了新的领域和语言,并有助于更深入地理解非英语专业领域中的混合搜索。
English
Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.

Summary

AI-Generated Summary

PDF32November 16, 2024