ChatPaper.aiChatPaper

VG-Refiner:基於智能體強化學習的工具精煉指稱接地推理研究

VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning

December 6, 2025
作者: Yuji Wang, Wenlong Liu, Jingxuan Niu, Haoji Zhang, Yansong Tang
cs.AI

摘要

工具整合視覺推理(TiVR)在增強多模態問題解決能力方面展現出巨大潛力。然而,現有的TiVR範式主要聚焦於通過強化學習整合各類視覺工具,卻未能設計有效的響應機制來處理不可靠或錯誤的工具輸出。這一侷限性在指代與定位任務中尤為明顯——不準確的檢測工具預測常會誤導TiVR模型產生虛幻推理。為解決此問題,我們提出VG-Refiner框架,這是首個專注於工具精煉指代定位推理的架構。技術層面,我們引入雙階段「思考-再思考」機制,使模型能顯式分析並響應工具反饋,同時設計精煉獎勵機制以激勵模型針對不良工具結果進行有效修正。此外,我們提出兩項新指標並建立公平評估協議,系統性衡量現有模型的精煉能力。通過採用少量任務特定數據增強VG-Refiner的精煉能力,我們在指代與推理定位基準測試中實現了準確率和修正能力的顯著提升,同時保持了預訓練模型的通用能力。
English
Tool-integrated visual reasoning (TiVR) has demonstrated great potential in enhancing multimodal problem-solving. However, existing TiVR paradigms mainly focus on integrating various visual tools through reinforcement learning, while neglecting to design effective response mechanisms for handling unreliable or erroneous tool outputs. This limitation is particularly pronounced in referring and grounding tasks, where inaccurate detection tool predictions often mislead TiVR models into generating hallucinated reasoning. To address this issue, we propose the VG-Refiner, the first framework aiming at the tool-refined referring grounded reasoning. Technically, we introduce a two-stage think-rethink mechanism that enables the model to explicitly analyze and respond to tool feedback, along with a refinement reward that encourages effective correction in response to poor tool results. In addition, we propose two new metrics and establish fair evaluation protocols to systematically measure the refinement ability of current models. We adopt a small amount of task-specific data to enhance the refinement capability of VG-Refiner, achieving a significant improvement in accuracy and correction ability on referring and reasoning grounding benchmarks while preserving the general capabilities of the pretrained model.
PDF73December 10, 2025