ChatPaper.aiChatPaper

强化内外知识协同推理的高效自适应搜索代理

Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent

May 12, 2025
作者: Ziyang Huang, Xiaowei Yuan, Yiming Ju, Jun Zhao, Kang Liu
cs.AI

摘要

检索增强生成(RAG)是减少大型语言模型(LLMs)幻觉的常见策略。尽管强化学习(RL)能够通过激活检索能力使LLMs充当搜索代理,但现有方法往往未能充分利用其内部知识。这可能导致冗余检索、潜在的有害知识冲突以及推理延迟增加。为解决这些局限,亟需一种高效且自适应的搜索代理,能够辨别最佳检索时机,并协同整合参数化(内部)与检索(外部)知识。本文介绍了强化内部-外部知识协同推理代理(IKEA),它能够识别自身知识边界,优先利用内部知识,仅在内部知识不足时求助于外部搜索。这一目标通过一种新颖的知识边界感知奖励函数和知识边界感知训练数据集实现,它们专为面向内部-外部知识协同的RL设计,激励模型提供准确答案、最小化不必要的检索,并在自身知识欠缺时鼓励适当的外部搜索。在多项知识推理任务上的评估表明,IKEA显著优于基线方法,大幅降低了检索频率,并展现出强大的泛化能力。
English
Retrieval-augmented generation (RAG) is a common strategy to reduce hallucinations in Large Language Models (LLMs). While reinforcement learning (RL) can enable LLMs to act as search agents by activating retrieval capabilities, existing ones often underutilize their internal knowledge. This can lead to redundant retrievals, potential harmful knowledge conflicts, and increased inference latency. To address these limitations, an efficient and adaptive search agent capable of discerning optimal retrieval timing and synergistically integrating parametric (internal) and retrieved (external) knowledge is in urgent need. This paper introduces the Reinforced Internal-External Knowledge Synergistic Reasoning Agent (IKEA), which could indentify its own knowledge boundary and prioritize the utilization of internal knowledge, resorting to external search only when internal knowledge is deemed insufficient. This is achieved using a novel knowledge-boundary aware reward function and a knowledge-boundary aware training dataset. These are designed for internal-external knowledge synergy oriented RL, incentivizing the model to deliver accurate answers, minimize unnecessary retrievals, and encourage appropriate external searches when its own knowledge is lacking. Evaluations across multiple knowledge reasoning tasks demonstrate that IKEA significantly outperforms baseline methods, reduces retrieval frequency significantly, and exhibits robust generalization capabilities.

Summary

AI-Generated Summary

PDF101May 13, 2025