Critic-R: Verbeteren van agentisch zoeken via instructie-afgestemde retrievers met natuurlijke taal introspectieve feedback

Samenvatting

Agentische zoeksystemen interageren iteratief met retrievalsystemen om complexe queries te beantwoorden. Ondanks aanzienlijke vooruitgang blijft het optimaliseren van retrievers voor agentisch zoeken uitdagend, vaak vereist het intensieve co-training of gouden standaardannotaties die de toepasbaarheid in de praktijk beperken. Wij stellen Critic-R voor, een raamwerk dat expliciet de feedbackloop tussen de redeneeragent en het retrievalmodel sluit tijdens zowel inferentie als training. Critic-R introduceert een criticusmodel dat het introspectieve redeneerspoor van de agent evalueert na het consumeren van opgehaald bewijsmateriaal, om te bepalen of de opgehaalde context de volgende redeneerstap voldoende ondersteunt. Critic-R heeft twee complementaire mechanismen: Critic-R-Zero, een query-verfijningslus tijdens inferentie die iteratief queries en retrievalinstructies herschrijft, en Critic-Embed, een optimalisatiebenadering voor retrievalmodellen die succesvolle en mislukte verfijningstrajecten benut als automatische supervisie, zonder handmatige relevantie-annotatie te vereisen. Wij evalueren Critic-R op HotpotQA, 2WikiMultihopQA, MuSiQue en Bamboogle. Resultaten tonen aan dat Critic-R zowel de retrievalkwaliteit als de downstream antwoordnauwkeurigheid aanzienlijk verbetert.

English

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.