ChatPaper.aiChatPaper

CLaRa:通过连续潜在推理架起检索与生成的桥梁

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

November 24, 2025
作者: Jie He, Richard He Bai, Sinead Williamson, Jeff Z. Pan, Navdeep Jaitly, Yizhe Zhang
cs.AI

摘要

檢索增強生成(RAG)技術通過引入外部知識增強大型語言模型(LLM)的能力,但仍面臨長上下文處理困難以及檢索與生成優化脫節的問題。本研究提出連續潛在推理(CLaRa)框架,在共享連續空間中實現嵌入式壓縮與聯合優化。為獲取語義豐富且可檢索的壓縮向量,我們設計了基於問答與複述監督的關鍵信息保持數據合成框架(SCP)。CLaRa通過單一語言建模損失函數端到端訓練重排序器與生成器,並利用可微分top-k估計器實現雙模塊的梯度傳導。理論分析表明,這種統一優化能使檢索相關性與答案質量協同提升。在多個問答基準測試中,CLaRa在壓縮效率和重排序性能方面均達到最先進水平,其表現甚至經常超越基於文本的微調基準模型。
English
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we propose CLaRa (Continuous Latent Reasoning), a unified framework that performs embedding-based compression and joint optimization in a shared continuous space. To obtain semantically rich and retrievable compressed vectors, we introduce SCP, a key-preserving data synthesis framework using QA and paraphrase supervision. CLaRa then trains the reranker and generator end-to-end via a single language modeling loss, with gradients flowing through both modules using a differentiable top-k estimator. Theoretically, this unified optimization aligns retrieval relevance with answer quality. Experiments across multiple QA benchmarks show that CLaRa achieves state-of-the-art compression and reranking performance, often surpassing text-based fine-tuned baselines.
PDF52December 1, 2025