MERIT:基於交錯多條件查詢的多語言語義檢索
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
June 3, 2025
作者: Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li
cs.AI
摘要
語義檢索對於現代應用至關重要,但在當前研究中仍未被充分探索。現有的數據集僅限於單一語言、單一圖像或單一檢索條件,往往未能充分利用視覺信息的表達能力,這從用文字描述替代圖像時性能保持不變即可見一斑。然而,實際的檢索場景經常涉及交織的多條件查詢與多張圖像。因此,本文引入了MERIT,首個用於交織多條件語義檢索的多語言數據集,包含32萬條查詢和13.5萬件產品,覆蓋5種語言及7個不同的產品類別。在MERIT上進行的大量實驗揭示了現有模型的侷限性:僅關注全局語義信息而忽視查詢中的特定條件元素。為此,我們提出了Coral,一種新穎的微調框架,通過整合嵌入重建來保留細粒度的條件元素,並利用對比學習來提取全面的全局語義。實驗表明,Coral在MERIT上相比傳統方法實現了45.9%的性能提升,並在8個成熟的檢索基準上驗證了其強大的泛化能力。總的來說,我們的研究貢獻——新穎的數據集、對現有方法關鍵侷限性的識別以及創新的微調框架——為未來交織多條件語義檢索的研究奠定了基礎。
English
Semantic retrieval is crucial for modern applications yet remains
underexplored in current research. Existing datasets are limited to single
languages, single images, or singular retrieval conditions, often failing to
fully exploit the expressive capacity of visual information as evidenced by
maintained performance when images are replaced with captions. However,
practical retrieval scenarios frequently involve interleaved multi-condition
queries with multiple images. Hence, this paper introduces MERIT, the first
multilingual dataset for interleaved multi-condition semantic retrieval,
comprising 320,000 queries with 135,000 products in 5 languages, covering 7
distinct product categories. Extensive experiments on MERIT identify existing
models's limitation: focusing solely on global semantic information while
neglecting specific conditional elements in queries. Consequently, we propose
Coral, a novel fine-tuning framework that adapts pre-trained MLLMs by
integrating embedding reconstruction to preserve fine-grained conditional
elements and contrastive learning to extract comprehensive global semantics.
Experiments demonstrate that Coral achieves a 45.9% performance improvement
over conventional approaches on MERIT, with strong generalization capabilities
validated across 8 established retrieval benchmarks. Collectively, our
contributions - a novel dataset, identification of critical limitations in
existing approaches, and an innovative fine-tuning framework - establish a
foundation for future research in interleaved multi-condition semantic
retrieval.