適切な時期を知る：法的領域における非英語ハイブリッド検索の調査

要旨

ハイブリッド検索は、異なるマッチングパラダイムの制限を相殺する効果的な戦略として登場しました。特に、異なるドメインの文脈において、検索品質の著しい改善が観察される場合に重要です。しかしながら、既存の研究は、主に限られた検索方法に焦点を当て、英語の汎用データセットでのみ評価しています。本研究では、フランス語の法律分野における未開拓の領域で、著名な検索モデルの効果を調査し、ゼロショットおよびインドメインのシナリオを評価します。我々の研究結果は、ゼロショットのコンテキストにおいて、異なる汎用モデルを統合することが、どのような統合方法を使用しても、スタンドアロンモデルを使用するよりも一貫してパフォーマンスを向上させることを示しています。驚くべきことに、モデルがインドメインで訓練された場合、最良の単一システムを使用することに比べて、統合は一般的にパフォーマンスを低下させることがわかります。ただし、スコアを慎重に調整された重みで統合する場合は除きます。これらの新しい知見などが、以前の研究結果を新しい分野や言語に拡張し、英語以外の専門分野におけるハイブリッド検索の理解を深めるのに貢献しています。

English

Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.

適切な時期を知る：法的領域における非英語ハイブリッド検索の調査

Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

要旨

Support