EvoEmbedding：面向长上下文检索与智能体记忆的可进化表示

摘要

现有的嵌入模型本质上是静态的：它们孤立地编码文本片段，忽略了其周围的上下文和时间顺序。本文提出EvoEmbedding，一种新型嵌入模型，能够生成可演化的检索表示。它特别适用于长上下文场景，其中信息是动态的、序列化的，并且需要持续的状态追踪。我们的设计十分简洁：EvoEmbedding在顺序处理输入时维护一个持续更新的潜在记忆，并将其与原始内容共同用于生成可演化的嵌入。因此，对于同一查询，我们的模型会根据动态演变的上下文调整其表示以检索不同的目标，超越了静态的语义搜索。为了使模型具备这一能力，我们构建了EvoTrain-180K，这是一个多样化数据集，用于联合优化潜在记忆与检索。此外，我们引入了一个记忆队列以防止循环编码过程中的表示坍缩，并采用了分段批处理技术来解决显著的输入长度差异问题，将训练速度提升3.8倍。大量实验表明，我们的模型不仅在多项长上下文检索基准上优于更大规模的专业模型（如Qwen3-Embedding-8B和KaLM-Embedding-Gemma3-12B），而且能够很好地泛化到上下文长度超过其训练窗口10倍的下游任务（如个性化推荐）中。值得注意的是，EvoEmbedding能够无缝集成到智能体工作流中以提升性能。例如，配备我们模型的朴素检索增强生成（RAG）流程超越了专用的智能体记忆系统。项目页面：https://clare-nie.github.io/EvoEmbedding。

English

Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for retrieval. It is tailored for long-context scenarios, where information is dynamic, sequential, and requires continuous state tracking. Our design is simple: EvoEmbedding maintains a continuously updated latent memory as it sequentially processes inputs, and uses it alongside the raw content to jointly generate evolvable embeddings. Consequently, for the same query, our model adapts its representation to retrieve distinct targets based on the evolving context, going beyond static semantic search. To equip the model with this capability, we construct EvoTrain-180K, a diverse dataset for the joint optimization of latent memory and retrieval. Furthermore, we introduce a memory queue to prevent representation collapse during recurrent encoding, alongside segment-batching techniques that tackle significant length variance and accelerate training by 3.8times. Extensive experiments show that our model not only outperforms larger-scale specialists (e.g., Qwen3-Embedding-8B and KaLM-Embedding-Gemma3-12B) across a range of long-context retrieval benchmarks, but also generalizes well to downstream tasks (e.g., personalization) with contexts 10times longer than its training window. Notably, EvoEmbedding seamlessly integrates into agentic workflows to boost performance. For instance, a naive RAG pipeline equipped with our model surpasses dedicated agentic memory systems. Project Page: https://clare-nie.github.io/EvoEmbedding.