ChatPaper.aiChatPaper

掌握融合時機:探討法律領域中的非英語混合檢索

Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

September 2, 2024
作者: Antoine Louis, Gijs van Dijck, Gerasimos Spanakis
cs.AI

摘要

混合檢索已成為一種有效策略,用於彌補不同匹配範式的限制,尤其是在跨領域情境下,檢索質量的顯著改善已被觀察到。然而,現有研究主要集中在有限的一組檢索方法上,在專門以英語為唯一語言的通用領域數據集上進行評估。在這項研究中,我們研究了在法語法律領域中一系列知名檢索模型的混合檢索效能,評估了零-shot和領域內情境。我們的研究結果顯示,在零-shot情境中,融合不同通用領域模型相對於使用獨立模型,無論融合方法如何,都能持續增強性能。令人驚訝的是,當模型在領域內進行訓練時,我們發現融合通常會相對於使用最佳單一系統而降低性能,除非使用經過精心調整權重的分數進行融合。這些新穎見解等,擴展了先前研究結果在新的領域和語言中的應用範圍,並有助於更深入地理解非英語專業領域中的混合檢索。
English
Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.

Summary

AI-Generated Summary

PDF32November 16, 2024