强化内外知识协同推理的高效自适应搜索代理
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
May 12, 2025
作者: Ziyang Huang, Xiaowei Yuan, Yiming Ju, Jun Zhao, Kang Liu
cs.AI
摘要
检索增强生成(RAG)是减少大型语言模型(LLMs)幻觉的常见策略。尽管强化学习(RL)能够通过激活检索能力使LLMs充当搜索代理,但现有方法往往未能充分利用其内部知识。这可能导致冗余检索、潜在的有害知识冲突以及推理延迟增加。为解决这些局限,亟需一种高效且自适应的搜索代理,能够辨别最佳检索时机,并协同整合参数化(内部)与检索(外部)知识。本文介绍了强化内部-外部知识协同推理代理(IKEA),它能够识别自身知识边界,优先利用内部知识,仅在内部知识不足时求助于外部搜索。这一目标通过一种新颖的知识边界感知奖励函数和知识边界感知训练数据集实现,它们专为面向内部-外部知识协同的RL设计,激励模型提供准确答案、最小化不必要的检索,并在自身知识欠缺时鼓励适当的外部搜索。在多项知识推理任务上的评估表明,IKEA显著优于基线方法,大幅降低了检索频率,并展现出强大的泛化能力。
English
Retrieval-augmented generation (RAG) is a common strategy to reduce
hallucinations in Large Language Models (LLMs). While reinforcement learning
(RL) can enable LLMs to act as search agents by activating retrieval
capabilities, existing ones often underutilize their internal knowledge. This
can lead to redundant retrievals, potential harmful knowledge conflicts, and
increased inference latency. To address these limitations, an efficient and
adaptive search agent capable of discerning optimal retrieval timing and
synergistically integrating parametric (internal) and retrieved (external)
knowledge is in urgent need. This paper introduces the Reinforced
Internal-External Knowledge Synergistic Reasoning Agent (IKEA), which could
indentify its own knowledge boundary and prioritize the utilization of internal
knowledge, resorting to external search only when internal knowledge is deemed
insufficient. This is achieved using a novel knowledge-boundary aware reward
function and a knowledge-boundary aware training dataset. These are designed
for internal-external knowledge synergy oriented RL, incentivizing the model to
deliver accurate answers, minimize unnecessary retrievals, and encourage
appropriate external searches when its own knowledge is lacking. Evaluations
across multiple knowledge reasoning tasks demonstrate that IKEA significantly
outperforms baseline methods, reduces retrieval frequency significantly, and
exhibits robust generalization capabilities.Summary
AI-Generated Summary