ChatPaper.aiChatPaper

FREESON:基于语料库遍历蒙特卡洛树搜索的无检索器增强推理方法

FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS

May 22, 2025
作者: Chaeeun Kim, Seungone Kim
cs.AI

摘要

大型推理模型(LRMs)在多步推理及适时调用搜索引擎方面展现了卓越能力。然而,现有的检索增强推理方法依赖于独立的检索模型,将LRM在检索中的角色局限于决定何时检索及如何查询。这种分离不仅增加了硬件和运营成本,还因表示瓶颈现象——即检索器的嵌入空间不足以满足生成器需求——导致检索过程中的错误。为解决这一问题,我们转变视角,从序列到序列的匹配转向在语料库中定位包含答案的路径,并提出了一种名为FREESON(无检索器的检索增强推理)的新框架。该框架使LRMs能够通过同时充当生成器和检索器,自主检索相关知识。为此,我们引入了一种专为检索任务设计的MCTS算法变体,称为CT-MCTS(语料库遍历蒙特卡洛树搜索)。在此算法中,LRMs遍历语料库,向包含答案的区域进发。我们在五个开放域问答基准上的测试结果,包括单跳和多跳问题,显示FREESON在EM和F1指标上平均比四个配备独立检索器的多步推理模型提升了14.4%,并且在最强基线模型上表现相当,在PopQA和2WikiMultihopQA上分别超出3%和2%。
English
Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in multi-step reasoning and calling search engines at appropriate steps. However, existing retrieval-augmented reasoning approaches rely on separate retrieval models, limiting the LRM's role in retrieval to deciding when to retrieve and how to query. This separation not only increases hardware and operational costs but also leads to errors in the retrieval process due to the representation bottleneck, a phenomenon where the retriever's embedding space is not expressive enough to meet the generator's requirements. To address this, we shift our perspective from sequence-to-sequence matching to locating the answer-containing paths within the corpus, and propose a novel framework called FREESON (Retriever-FREE Retrieval-Augmented ReaSONing). This framework enables LRMs to retrieve relevant knowledge on their own by acting as both a generator and retriever. To achieve this, we introduce a variant of the MCTS algorithm specialized for the retrieval task, which we call CT-MCTS (Corpus-Traversing Monte Carlo Tree Search). In this algorithm, LRMs traverse through the corpus toward answer-containing regions. Our results on five open-domain QA benchmarks, including single-hop and multi-hop questions, show that FREESON achieves an average improvement of 14.4% in EM and F1 over four multi-step reasoning models with a separate retriever, and it also performs comparably to the strongest baseline, surpassing it by 3% on PopQA and 2WikiMultihopQA.

Summary

AI-Generated Summary

PDF22May 26, 2025