你的混合专家LLM实际上是一个免费的嵌入模型
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
October 14, 2024
作者: Ziyue Li, Tianyi Zhou
cs.AI
摘要
尽管大型语言模型(LLMs)在生成任务上表现出色,但其仅解码器的架构通常会限制其作为嵌入模型的潜力,除非进行进一步的表示微调。这是否与它们作为通用模型的声明相矛盾?为了回答这个问题,我们深入研究了专家混合(MoE)LLMs。我们的研究表明,MoE LLMs中的专家路由器可以作为一种即插即用的嵌入模型,在各种以嵌入为重点的任务上表现出色,而无需任何微调。此外,我们的广泛分析显示,MoE路由权重(RW)与LLMs的隐藏状态(HS)是互补的,后者是一种广泛使用的嵌入。与HS相比,我们发现RW对提示的选择更加稳健,并侧重于高级语义。在分析的基础上,我们提出了结合RW和HS的MoEE,其性能优于单独使用任一者。我们对它们的组合和提示策略进行了探索,得出了一些新颖见解,例如,RW和HS相似性的加权和优于它们的连接相似性。我们在来自大规模文本嵌入基准(MTEB)的20个数据集上进行了6个嵌入任务的实验。结果表明,MoEE对基于LLM的嵌入带来了显著改进,而无需进一步微调。
English
While large language models (LLMs) excel on generation tasks, their
decoder-only architecture often limits their potential as embedding models if
no further representation finetuning is applied. Does this contradict their
claim of generalists? To answer the question, we take a closer look at
Mixture-of-Experts (MoE) LLMs. Our study shows that the expert routers in MoE
LLMs can serve as an off-the-shelf embedding model with promising performance
on a diverse class of embedding-focused tasks, without requiring any
finetuning. Moreover, our extensive analysis shows that the MoE routing weights
(RW) is complementary to the hidden state (HS) of LLMs, a widely-used
embedding. Compared to HS, we find that RW is more robust to the choice of
prompts and focuses on high-level semantics. Motivated by the analysis, we
propose MoEE combining RW and HS, which achieves better performance than using
either separately. Our exploration of their combination and prompting strategy
shed several novel insights, e.g., a weighted sum of RW and HS similarities
outperforms the similarity on their concatenation. Our experiments are
conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding
Benchmark (MTEB). The results demonstrate the significant improvement brought
by MoEE to LLM-based embedding without further finetuning.Summary
AI-Generated Summary