让多模态嵌入器通过自适应查询增强学习何时增强查询
Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
November 4, 2025
作者: Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim, Jaehyun Park
cs.AI
摘要
查询增强技术通过向查询附加补充信息来提升其语义丰富度,从而更精准地匹配相关文档。当前研究提出了基于大语言模型(LLM)的嵌入器,通过多任务学习模式同时优化嵌入表示和查询增强生成,充分利用LLM的生成能力。在推理阶段,这些联合训练的嵌入器先执行查询增强再进行嵌入操作,展现出显著效果。然而,对所有查询进行增强会导致较高的嵌入延迟,且部分查询的增强反而会损害检索性能。此外,现有方法尚未在多模态环境中进行探索。针对这些问题,我们提出M-Solomon——一种能自适应判断是否进行查询增强的通用多模态嵌入器。该方法首先在数据集层级将训练查询划分为两组:需要增强的查询与无需增强的查询。随后通过强大的多模态大语言模型(MLLM)为需要增强的查询生成合适的增强内容。接着我们提出自适应查询增强机制:M-Solomon通过学习为需要增强的查询生成带"/augment"前缀的合成增强内容,而为其他查询生成简单字符串"/embed",从而实现按需增强。实验结果表明,M-Solomon不仅大幅超越无增强的基线模型,其性能也优于持续使用增强的基线方法,同时显著降低了嵌入延迟。
English
Query augmentation makes queries more meaningful by appending further
information to the queries to find relevant documents. Current studies have
proposed Large Language Model (LLM)-based embedders, which learn representation
for embedding and generation for query augmentation in a multi-task manner by
leveraging the generative capabilities of LLM. During inference, these jointly
trained embedders have conducted query augmentation followed by embedding,
showing effective results. However, augmenting every query leads to substantial
embedding latency and query augmentation can be detrimental to performance for
some queries. Also, previous methods have not been explored in multimodal
environments. To tackle these problems, we propose M-Solomon, a universal
multimodal embedder that can adaptively determine when to augment queries. Our
approach first divides the queries of the training datasets into two groups at
the dataset level. One includes queries that require augmentation and the other
includes queries that do not. Then, we introduces a synthesis process that
generates appropriate augmentations for queries that require them by leveraging
a powerful Multimodal LLM (MLLM). Next, we present adaptive query augmentation.
Through this step, M-Solomon can conduct query augmentation only when necessary
by learning to generate synthetic augmentations with the prefix /augment for
queries that demand them and to generate the simple string /embed for others.
Experimental results showed that M-Solomon not only surpassed the baseline
without augmentation by a large margin but also outperformed the baseline that
always used augmentation, providing much faster embedding latency.