EvoEmbedding: 長文脈検索とエージェント的記憶のための進化可能な表現

要旨

既存の埋め込みモデルは本質的に静的であり、テキスト断片を周囲のコンテキストや時間的順序を無視して単独で符号化する。本論文では、取得のための進化的表現を生成する新しい埋め込みモデルであるEvoEmbeddingを提案する。これは、情報が動的かつ順序的であり、継続的な状態追跡を必要とする長期コンテキストシナリオに特化して設計されている。我々の設計はシンプルである。EvoEmbeddingは入力を順次処理する際に継続的に更新される潜在記憶を維持し、それを生のコンテンツと併用して進化的埋め込みを共同生成する。その結果、同じクエリに対しても、本モデルは進化するコンテキストに基づいて表現を適応させ、異なるターゲットを取得できるようになり、静的な意味的検索を超える性能を発揮する。この能力をモデルに付与するため、潜在記憶と取得の共同最適化を目的とした多様なデータセットEvoTrain-180Kを構築した。さらに、反復符号化中の表現崩壊を防ぐメモリキューと、大きな長さのばらつきに対処し訓練を3.8倍高速化するセグメントバッチ処理技術を導入する。広範な実験により、本モデルは様々な長期コンテキスト取得ベンチマークにおいて、より大規模な専門モデル（Qwen3-Embedding-8BやKaLM-Embedding-Gemma3-12Bなど）を凌駕するだけでなく、訓練時のウィンドウの10倍のコンテキスト長を持つ下流タスク（パーソナライゼーションなど）にも良好に汎化する。特筆すべきは、EvoEmbeddingがエージェント型ワークフローにシームレスに統合され、性能を向上させる点である。例えば、本モデルを備えた単純なRAGパイプラインは、専用のエージェント型記憶システムを凌駕する。プロジェクトページ: https://clare-nie.github.io/EvoEmbedding

English

Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for retrieval. It is tailored for long-context scenarios, where information is dynamic, sequential, and requires continuous state tracking. Our design is simple: EvoEmbedding maintains a continuously updated latent memory as it sequentially processes inputs, and uses it alongside the raw content to jointly generate evolvable embeddings. Consequently, for the same query, our model adapts its representation to retrieve distinct targets based on the evolving context, going beyond static semantic search. To equip the model with this capability, we construct EvoTrain-180K, a diverse dataset for the joint optimization of latent memory and retrieval. Furthermore, we introduce a memory queue to prevent representation collapse during recurrent encoding, alongside segment-batching techniques that tackle significant length variance and accelerate training by 3.8times. Extensive experiments show that our model not only outperforms larger-scale specialists (e.g., Qwen3-Embedding-8B and KaLM-Embedding-Gemma3-12B) across a range of long-context retrieval benchmarks, but also generalizes well to downstream tasks (e.g., personalization) with contexts 10times longer than its training window. Notably, EvoEmbedding seamlessly integrates into agentic workflows to boost performance. For instance, a naive RAG pipeline equipped with our model surpasses dedicated agentic memory systems. Project Page: https://clare-nie.github.io/EvoEmbedding.