你的混合专家LLM实际上是一个免费的嵌入模型

摘要

尽管大型语言模型（LLMs）在生成任务上表现出色，但其仅解码器的架构通常会限制其作为嵌入模型的潜力，除非进行进一步的表示微调。这是否与它们作为通用模型的声明相矛盾？为了回答这个问题，我们深入研究了专家混合（MoE）LLMs。我们的研究表明，MoE LLMs中的专家路由器可以作为一种即插即用的嵌入模型，在各种以嵌入为重点的任务上表现出色，而无需任何微调。此外，我们的广泛分析显示，MoE路由权重（RW）与LLMs的隐藏状态（HS）是互补的，后者是一种广泛使用的嵌入。与HS相比，我们发现RW对提示的选择更加稳健，并侧重于高级语义。在分析的基础上，我们提出了结合RW和HS的MoEE，其性能优于单独使用任一者。我们对它们的组合和提示策略进行了探索，得出了一些新颖见解，例如，RW和HS相似性的加权和优于它们的连接相似性。我们在来自大规模文本嵌入基准（MTEB）的20个数据集上进行了6个嵌入任务的实验。结果表明，MoEE对基于LLM的嵌入带来了显著改进，而无需进一步微调。

English

While large language models (LLMs) excel on generation tasks, their decoder-only architecture often limits their potential as embedding models if no further representation finetuning is applied. Does this contradict their claim of generalists? To answer the question, we take a closer look at Mixture-of-Experts (MoE) LLMs. Our study shows that the expert routers in MoE LLMs can serve as an off-the-shelf embedding model with promising performance on a diverse class of embedding-focused tasks, without requiring any finetuning. Moreover, our extensive analysis shows that the MoE routing weights (RW) is complementary to the hidden state (HS) of LLMs, a widely-used embedding. Compared to HS, we find that RW is more robust to the choice of prompts and focuses on high-level semantics. Motivated by the analysis, we propose MoEE combining RW and HS, which achieves better performance than using either separately. Our exploration of their combination and prompting strategy shed several novel insights, e.g., a weighted sum of RW and HS similarities outperforms the similarity on their concatenation. Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MoEE to LLM-based embedding without further finetuning.

你的混合专家LLM实际上是一个免费的嵌入模型

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

摘要

Support