ChatPaper.aiChatPaper

AdaReasoner:面向迭代式視覺推理的動態工具協調框架

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

January 26, 2026
作者: Mingyang Song, Haoyu Sun, Jiawei Gu, Linjie Li, Luxin Xu, Ranjay Krishna, Yu Cheng
cs.AI

摘要

當人類面臨超出當下能力的問題時,會借助工具來解決,這為提升多模態大型語言模型(MLLMs)的視覺推理能力提供了可行範式。因此,有效的推理關鍵在於:即使面對新工具或新任務,也能判斷該使用哪些工具、何時調用工具,以及如何進行多步驟的工具組合。我們提出 AdaReasoner——一個將工具使用作為通用推理技能(而非工具特定或需顯式監督的行為)來學習的多模態模型系列。AdaReasoner 的實現基於三大核心組件:(一)可擴展的數據構建流程,使模型接觸長視野、多步驟的工具交互;(二)Tool-GRPO 強化學習算法,根據最終任務成功率優化工具選擇與序列規劃;(三)自適應學習機制,動態調控工具使用頻率。這些組件共同使模型能從任務上下文與中間結果推斷工具效用,實現多工具協同操作並泛化至未見過的工具。實證研究表明,AdaReasoner 展現出強大的工具適應與泛化能力:即使未經顯式訓練,它也能自主採納有益工具、抑制無關工具,並根據任務需求調整工具使用頻率。這些能力轉化為在多項高難度基準測試中的領先表現:7B 基礎模型平均提升 24.9%,在多項任務(包括 VSP 與 Jigsaw)上超越 GPT-5 等強力專有系統。
English
When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce AdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these components allow models to infer tool utility from task context and intermediate outcomes, enabling coordination of multiple tools and generalization to unseen tools. Empirically, AdaReasoner exhibits strong tool-adaptive and generalization behaviors: it autonomously adopts beneficial tools, suppresses irrelevant ones, and adjusts tool usage frequency based on task demands, despite never being explicitly trained to do so. These capabilities translate into state-of-the-art performance across challenging benchmarks, improving the 7B base model by +24.9\% on average and surpassing strong proprietary systems such as GPT-5 on multiple tasks, including VSP and Jigsaw.
PDF383January 29, 2026