ChatPaper.aiChatPaper

Critic-R:使用指令微調檢索器與自然語言內省反饋以改進代理式搜尋

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

May 30, 2026
作者: Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani
cs.AI

摘要

自主搜索系統透過與檢索模型反覆互動,處理複雜查詢。儘管已有顯著進展,但為自主搜索優化檢索模型仍具挑戰,往往需要大量的共同訓練或黃金標準註釋,限制了實際應用。我們提出Critic-R框架,在推理與訓練過程中明確閉合推理代理與檢索模型之間的回饋迴路。Critic-R引入一個評判模型,在接收檢索證據後評估代理的內省推理軌跡,以判斷檢索到的上下文是否充分支持下一步推理。Critic-R包含兩種互補機制:Critic-R-Zero是一種推理時查詢精煉迴路,能反覆改寫查詢與檢索指令;Critic-Embed則是一種檢索模型優化方法,利用成功與失敗的精煉軌跡作為自動監督訊號,無需人工相關性註釋。我們在HotpotQA、2WikiMultihopQA、MuSiQue及Bamboogle上評估Critic-R。結果顯示,Critic-R能顯著提升檢索品質與下游答案準確性。
English
Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.