您的專家混合式LLM其實是一個免費的嵌入模型。

摘要

儘管大型語言模型（LLMs）在生成任務上表現出色，但其僅具解碼器的架構通常會限制其作為嵌入模型的潛力，除非進行進一步的表示微調。這是否與它們作為通用模型的聲稱相矛盾？為了回答這個問題，我們仔細研究了專家混合（MoE）LLMs。我們的研究顯示，MoE LLMs 中的專家路由器可以作為一種即插即用的嵌入模型，在各種嵌入專注任務上表現出色，而無需進行任何微調。此外，我們的廣泛分析表明，MoE 路由權重（RW）與LLMs 的隱藏狀態（HS）是互補的，後者是一種廣泛使用的嵌入。與HS相比，我們發現RW對提示的選擇更為穩健，並侧重於高層語義。受到這一分析的啟發，我們提出了結合RW和HS的MoEE，其性能優於單獨使用任一者。我們對它們的組合和提示策略的探索帶來了一些新的見解，例如，RW和HS相似性的加權和優於它們的串聯相似性。我們在來自大規模文本嵌入基準（MTEB）的20個數據集上進行了6個嵌入任務的實驗。結果顯示，MoEE 對LLM-based 嵌入帶來了顯著的改進，而無需進一步微調。

English

While large language models (LLMs) excel on generation tasks, their decoder-only architecture often limits their potential as embedding models if no further representation finetuning is applied. Does this contradict their claim of generalists? To answer the question, we take a closer look at Mixture-of-Experts (MoE) LLMs. Our study shows that the expert routers in MoE LLMs can serve as an off-the-shelf embedding model with promising performance on a diverse class of embedding-focused tasks, without requiring any finetuning. Moreover, our extensive analysis shows that the MoE routing weights (RW) is complementary to the hidden state (HS) of LLMs, a widely-used embedding. Compared to HS, we find that RW is more robust to the choice of prompts and focuses on high-level semantics. Motivated by the analysis, we propose MoEE combining RW and HS, which achieves better performance than using either separately. Our exploration of their combination and prompting strategy shed several novel insights, e.g., a weighted sum of RW and HS similarities outperforms the similarity on their concatenation. Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MoEE to LLM-based embedding without further finetuning.

您的專家混合式LLM其實是一個免費的嵌入模型。

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

摘要

Support