ChatPaper.aiChatPaper

让多模态嵌入器通过自适应查询增强学习何时增强查询

Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

November 4, 2025
作者: Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim, Jaehyun Park
cs.AI

摘要

查询增强技术通过向查询附加额外信息来提升查询的语义完整性,从而更有效地检索相关文档。当前研究提出了基于大语言模型(LLM)的嵌入器,这类模型通过利用LLM的生成能力,以多任务方式同时学习嵌入表示和查询增强生成。在推理阶段,这些联合训练的嵌入器会先执行查询增强再进行嵌入操作,展现出显著效果。然而,对所有查询进行增强会导致嵌入延迟大幅增加,且某些查询的增强反而会损害检索性能。此外,现有方法尚未在多模态环境中进行探索。为解决这些问题,我们提出M-Solomon——一种能自适应判断何时进行查询增强的通用多模态嵌入器。我们的方法首先在数据集层面将训练集中的查询划分为两类:需要增强的查询与无需增强的查询。随后通过引入基于强大多模态大语言模型(MLLM)的合成流程,为需要增强的查询生成合适的增强内容。接着我们提出自适应查询增强机制:通过学习为需要增强的查询生成带有"/augment"前缀的合成增强内容,而为其他查询生成简单字符串"/embed",M-Solomon可实现按需增强。实验结果表明,M-Solomon不仅大幅超越无增强的基线模型,其性能也优于全程使用增强的基线模型,同时显著降低了嵌入延迟。
English
Query augmentation makes queries more meaningful by appending further information to the queries to find relevant documents. Current studies have proposed Large Language Model (LLM)-based embedders, which learn representation for embedding and generation for query augmentation in a multi-task manner by leveraging the generative capabilities of LLM. During inference, these jointly trained embedders have conducted query augmentation followed by embedding, showing effective results. However, augmenting every query leads to substantial embedding latency and query augmentation can be detrimental to performance for some queries. Also, previous methods have not been explored in multimodal environments. To tackle these problems, we propose M-Solomon, a universal multimodal embedder that can adaptively determine when to augment queries. Our approach first divides the queries of the training datasets into two groups at the dataset level. One includes queries that require augmentation and the other includes queries that do not. Then, we introduces a synthesis process that generates appropriate augmentations for queries that require them by leveraging a powerful Multimodal LLM (MLLM). Next, we present adaptive query augmentation. Through this step, M-Solomon can conduct query augmentation only when necessary by learning to generate synthetic augmentations with the prefix /augment for queries that demand them and to generate the simple string /embed for others. Experimental results showed that M-Solomon not only surpassed the baseline without augmentation by a large margin but also outperformed the baseline that always used augmentation, providing much faster embedding latency.
PDF32December 1, 2025