퓨전할 때를 알라: 법률 분야에서의 비영어 하이브리드 검색 연구

초록

하이브리드 검색은 서로 다른 매칭 패러다임의 한계를 극복하는 효과적인 전략으로 등장했으며, 특히 도메인 외 맥락에서 검색 품질의 현저한 향상이 관찰되는 곳에서 특히 유용하다. 그러나 기존 연구는 주로 한정된 검색 방법 세트에 초점을 맞추고, 영어로만 된 도메인-일반 데이터셋에서만 짝을 이루어 평가되었다. 본 연구에서는 프랑스어로 된 법률 분야에서 주요 검색 모델의 다양한 하이브리드 검색의 효과를 연구하며, 영역 외 및 영역 내 시나리오를 평가한다. 우리의 연구 결과는 영역 외 상황에서, 서로 다른 도메인-일반 모델을 퓨징하는 것이 퓨징 방법과 관계없이 독립적인 모델 사용보다 성능을 일관되게 향상시킨다는 것을 보여준다. 놀랍게도, 모델이 영역 내에서 훈련된 경우, 퓨징은 일반적으로 가장 우수한 단일 시스템 사용보다 성능을 저하시키는 것으로 나타나며, 이는 점수를 신중하게 조정된 가중치로 퓨징할 때에만 성능이 향상된다. 이러한 새로운 통찰력을 통해 이전 연구 결과의 적용 범위를 새로운 분야와 언어로 확장하고, 비영어 전문 분야에서의 하이브리드 검색에 대한 보다 심층적인 이해에 기여한다.

English

Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.

퓨전할 때를 알라: 법률 분야에서의 비영어 하이브리드 검색 연구

Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain

초록

Support