搜索-R3：大语言模型中推理与嵌入生成的统一框架

摘要

尽管大型语言模型（LLMs）在自然语言理解方面展现出卓越能力，但在检索任务中的应用却相对不足。我们提出了Search-R3这一创新框架，通过调整LLMs使其在推理过程中直接生成搜索嵌入，有效解决了这一局限。该框架充分利用LLMs的链式思维特性，使其能够通过逐步推理进行复杂的语义分析，从而生成更为高效的嵌入。我们通过三种互补机制实现这一目标：（1）监督学习阶段提升模型生成高质量嵌入的能力；（2）强化学习（RL）方法同步优化嵌入生成与推理过程；（3）专门的RL环境，有效处理不断演变的嵌入表示，无需在每次训练迭代时重新编码整个语料库。我们在多种基准测试上的广泛评估表明，Search-R3通过统一推理与嵌入生成过程，显著超越了现有方法。这种集成的后训练方式在处理需要复杂推理和高效信息检索的知识密集型任务方面，标志着一次重大进步。项目页面：https://github.com/ytgui/Search-R3

English

Despite their remarkable natural language understanding capabilities, Large Language Models (LLMs) have been underutilized for retrieval tasks. We present Search-R3, a novel framework that addresses this limitation by adapting LLMs to generate search embeddings as a direct output of their reasoning process. Our approach exploits LLMs' chain-of-thought capabilities, allowing them to produce more effective embeddings by reasoning step-by-step through complex semantic analyses. We implement this through three complementary mechanisms. (1) a supervised learning stage enables the model's ability to produce quality embeddings, (2) a reinforcement learning (RL) methodology that optimizes embedding generation alongside reasoning, and (3) a specialized RL environment that efficiently handles evolving embedding representations without requiring complete corpus re-encoding at each training iteration. Our extensive evaluations on diverse benchmarks demonstrate that Search-R3 significantly outperforms prior methods by unifying the reasoning and embedding generation processes. This integrated post-training approach represents a substantial advancement in handling complex knowledge-intensive tasks that require both sophisticated reasoning and effective information retrieval. Project page: https://github.com/ytgui/Search-R3

搜索-R3：大语言模型中推理与嵌入生成的统一框架

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

摘要

Support