ChatPaper.aiChatPaper

KV嵌入:基于解码器专用大语言模型内部键值重路由的无训练文本嵌入方法

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

January 3, 2026
作者: Yixuan Tang, Yi Yang
cs.AI

摘要

儘管大型語言模型(LLM)是強大的嵌入骨幹,其在免訓練場景中的應用仍面臨兩大結構性挑戰:因果注意力機制限制了早期令牌獲取後續上下文的能力,而下一令牌預測目標則使表徵偏向生成任務而非語義壓縮。為解決這些局限,我們提出KV-Embedding框架,旨在激活凍結LLM的潛在表徵能力。該方法基於一項關鍵觀察:每一層最終令牌的鍵值(KV)狀態編碼了序列的壓縮視圖。通過將這些狀態重新路由為前置符號,我們使所有令牌能在單次前向傳播中訪問序列級上下文。為確保模型無關的適用性,我們引入了基於內在維度的自動化層選擇策略。在Qwen、Mistral和Llama骨幹網絡上進行的MTEB評估表明,KV-Embedding相比現有免訓練基線最高提升10%性能,並在長達4,096令牌的序列上保持穩健表現。這些結果證明,內部狀態操縱為輸入修改提供了高效替代方案,我們希望此工作能推動針對表徵學習的LLM內部機制進一步探索。
English
While LLMs are powerful embedding backbones, their application in training-free settings faces two structural challenges: causal attention restricts early tokens from accessing subsequent context, and the next-token prediction objective biases representations toward generation rather than semantic compression. To address these limitations, we propose KV-Embedding, a framework that activates the latent representation power of frozen LLMs. Our method leverages the observation that the key-value (KV) states of the final token at each layer encode a compressed view of the sequence. By re-routing these states as a prepended prefix, we enable all tokens to access sequence-level context within a single forward pass. To ensure model-agnostic applicability, we introduce an automated layer selection strategy based on intrinsic dimensionality. Evaluations on MTEB across Qwen, Mistral, and Llama backbones show that KV-Embedding outperforms existing training-free baselines by up to 10%, while maintaining robust performance on sequences up to 4,096 tokens. These results demonstrate that internal state manipulation offers an efficient alternative to input modification, and we hope this work encourages further exploration of LLM internals for representation learning.
PDF133February 9, 2026