Critic-R：使用指令微調檢索器與自然語言內省反饋以改進代理式搜尋

摘要

自主搜索系統透過與檢索模型反覆互動，處理複雜查詢。儘管已有顯著進展，但為自主搜索優化檢索模型仍具挑戰，往往需要大量的共同訓練或黃金標準註釋，限制了實際應用。我們提出Critic-R框架，在推理與訓練過程中明確閉合推理代理與檢索模型之間的回饋迴路。Critic-R引入一個評判模型，在接收檢索證據後評估代理的內省推理軌跡，以判斷檢索到的上下文是否充分支持下一步推理。Critic-R包含兩種互補機制：Critic-R-Zero是一種推理時查詢精煉迴路，能反覆改寫查詢與檢索指令；Critic-Embed則是一種檢索模型優化方法，利用成功與失敗的精煉軌跡作為自動監督訊號，無需人工相關性註釋。我們在HotpotQA、2WikiMultihopQA、MuSiQue及Bamboogle上評估Critic-R。結果顯示，Critic-R能顯著提升檢索品質與下游答案準確性。

English

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.