強化內外知識協同推理的高效適應性搜索代理
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
May 12, 2025
作者: Ziyang Huang, Xiaowei Yuan, Yiming Ju, Jun Zhao, Kang Liu
cs.AI
摘要
檢索增強生成(RAG)是減少大型語言模型(LLMs)幻覺的一種常見策略。雖然強化學習(RL)能夠通過激活檢索能力使LLMs充當搜索代理,但現有的方法往往未能充分利用其內部知識。這可能導致冗餘檢索、潛在的有害知識衝突以及推理延遲的增加。為解決這些限制,迫切需要一種高效且自適應的搜索代理,能夠辨別最佳檢索時機,並協同整合參數化(內部)與檢索(外部)知識。本文介紹了強化內外知識協同推理代理(IKEA),該代理能夠識別其自身的知識邊界,並優先利用內部知識,僅在內部知識被認為不足時才求助於外部搜索。這是通過一種新穎的知識邊界感知獎勵函數和知識邊界感知訓練數據集實現的,這些設計旨在面向內外知識協同的RL,激勵模型提供準確答案,最小化不必要的檢索,並在自身知識不足時鼓勵適當的外部搜索。在多項知識推理任務中的評估表明,IKEA顯著優於基線方法,大幅降低了檢索頻率,並展現出強大的泛化能力。
English
Retrieval-augmented generation (RAG) is a common strategy to reduce
hallucinations in Large Language Models (LLMs). While reinforcement learning
(RL) can enable LLMs to act as search agents by activating retrieval
capabilities, existing ones often underutilize their internal knowledge. This
can lead to redundant retrievals, potential harmful knowledge conflicts, and
increased inference latency. To address these limitations, an efficient and
adaptive search agent capable of discerning optimal retrieval timing and
synergistically integrating parametric (internal) and retrieved (external)
knowledge is in urgent need. This paper introduces the Reinforced
Internal-External Knowledge Synergistic Reasoning Agent (IKEA), which could
indentify its own knowledge boundary and prioritize the utilization of internal
knowledge, resorting to external search only when internal knowledge is deemed
insufficient. This is achieved using a novel knowledge-boundary aware reward
function and a knowledge-boundary aware training dataset. These are designed
for internal-external knowledge synergy oriented RL, incentivizing the model to
deliver accurate answers, minimize unnecessary retrievals, and encourage
appropriate external searches when its own knowledge is lacking. Evaluations
across multiple knowledge reasoning tasks demonstrate that IKEA significantly
outperforms baseline methods, reduces retrieval frequency significantly, and
exhibits robust generalization capabilities.Summary
AI-Generated Summary