让多模态嵌入器通过自适应查询增强学习何时增强查询

摘要

查询增强技术通过向查询附加补充信息来提升其语义丰富度，从而更精准地匹配相关文档。当前研究提出了基于大语言模型（LLM）的嵌入器，通过多任务学习模式同时优化嵌入表示和查询增强生成，充分利用LLM的生成能力。在推理阶段，这些联合训练的嵌入器先执行查询增强再进行嵌入操作，展现出显著效果。然而，对所有查询进行增强会导致较高的嵌入延迟，且部分查询的增强反而会损害检索性能。此外，现有方法尚未在多模态环境中进行探索。针对这些问题，我们提出M-Solomon——一种能自适应判断是否进行查询增强的通用多模态嵌入器。该方法首先在数据集层级将训练查询划分为两组：需要增强的查询与无需增强的查询。随后通过强大的多模态大语言模型（MLLM）为需要增强的查询生成合适的增强内容。接着我们提出自适应查询增强机制：M-Solomon通过学习为需要增强的查询生成带"/augment"前缀的合成增强内容，而为其他查询生成简单字符串"/embed"，从而实现按需增强。实验结果表明，M-Solomon不仅大幅超越无增强的基线模型，其性能也优于持续使用增强的基线方法，同时显著降低了嵌入延迟。

English

Query augmentation makes queries more meaningful by appending further information to the queries to find relevant documents. Current studies have proposed Large Language Model (LLM)-based embedders, which learn representation for embedding and generation for query augmentation in a multi-task manner by leveraging the generative capabilities of LLM. During inference, these jointly trained embedders have conducted query augmentation followed by embedding, showing effective results. However, augmenting every query leads to substantial embedding latency and query augmentation can be detrimental to performance for some queries. Also, previous methods have not been explored in multimodal environments. To tackle these problems, we propose M-Solomon, a universal multimodal embedder that can adaptively determine when to augment queries. Our approach first divides the queries of the training datasets into two groups at the dataset level. One includes queries that require augmentation and the other includes queries that do not. Then, we introduces a synthesis process that generates appropriate augmentations for queries that require them by leveraging a powerful Multimodal LLM (MLLM). Next, we present adaptive query augmentation. Through this step, M-Solomon can conduct query augmentation only when necessary by learning to generate synthetic augmentations with the prefix /augment for queries that demand them and to generate the simple string /embed for others. Experimental results showed that M-Solomon not only surpassed the baseline without augmentation by a large margin but also outperformed the baseline that always used augmentation, providing much faster embedding latency.

让多模态嵌入器通过自适应查询增强学习何时增强查询

Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

摘要

Support