ChatPaper.aiChatPaper

CLaRa:通过连续潜在推理架起检索与生成的桥梁

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

November 24, 2025
作者: Jie He, Richard He Bai, Sinead Williamson, Jeff Z. Pan, Navdeep Jaitly, Yizhe Zhang
cs.AI

摘要

检索增强生成(RAG)技术通过引入外部知识增强大语言模型(LLM)的能力,但仍面临长上下文处理困难以及检索与生成环节优化割裂的问题。本文提出CLaRa(连续潜在推理)框架,在共享的连续空间内实现基于嵌入的压缩与联合优化。为获得语义丰富且可检索的压缩向量,我们设计了SCP框架——一种通过问答与复述监督实现关键信息保留的数据合成方法。CLaRa通过单一语言建模损失端到端训练重排序器与生成器,并利用可微分top-k估计器实现双模块的梯度传导。理论分析表明,这种联合优化能使检索相关性与答案质量相统一。在多问答基准测试中,CLaRa在压缩与重排序性能上达到最优水平,多数情况下超越基于文本的微调基线模型。
English
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we propose CLaRa (Continuous Latent Reasoning), a unified framework that performs embedding-based compression and joint optimization in a shared continuous space. To obtain semantically rich and retrievable compressed vectors, we introduce SCP, a key-preserving data synthesis framework using QA and paraphrase supervision. CLaRa then trains the reranker and generator end-to-end via a single language modeling loss, with gradients flowing through both modules using a differentiable top-k estimator. Theoretically, this unified optimization aligns retrieval relevance with answer quality. Experiments across multiple QA benchmarks show that CLaRa achieves state-of-the-art compression and reranking performance, often surpassing text-based fine-tuned baselines.
PDF52December 1, 2025