ChatPaper.aiChatPaper

AdaReasoner:面向迭代式视觉推理的动态工具编排系统

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

January 26, 2026
作者: Mingyang Song, Haoyu Sun, Jiawei Gu, Linjie Li, Luxin Xu, Ranjay Krishna, Yu Cheng
cs.AI

摘要

当人类面临超出即时能力的问题时,会借助工具寻求解决,这为提升多模态大语言模型的视觉推理能力提供了可行范式。有效的推理关键在于:即使面对新工具或新任务,也能准确判断使用何种工具、何时调用工具以及如何分步骤组合工具。我们提出AdaReasoner——一个将工具使用作为通用推理技能而非特定工具行为或显式监督行为的多模态模型家族。该模型通过三大创新实现突破:(一)可扩展的数据构建流程,使模型接触长跨度、多步骤的工具交互;(二)Tool-GRPO强化学习算法,根据终端任务成功率优化工具选择与序列组合;(三)自适应学习机制,动态调节工具使用策略。这些组件协同工作,使模型能够从任务上下文和中间结果推断工具效用,实现多工具协调运作并对未见工具泛化应用。实验表明,AdaReasoner展现出强大的工具自适应与泛化能力:尽管未接受显式训练,它能自主采纳有效工具、抑制无关工具,并根据任务需求动态调整工具使用频率。这些能力使其在多项挑战性基准测试中达到最先进水平,7B基础模型平均性能提升24.9%,在VSP、Jigsaw等任务上甚至超越GPT-5等强效专有系统。
English
When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce AdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these components allow models to infer tool utility from task context and intermediate outcomes, enabling coordination of multiple tools and generalization to unseen tools. Empirically, AdaReasoner exhibits strong tool-adaptive and generalization behaviors: it autonomously adopts beneficial tools, suppresses irrelevant ones, and adjusts tool usage frequency based on task demands, despite never being explicitly trained to do so. These capabilities translate into state-of-the-art performance across challenging benchmarks, improving the 7B base model by +24.9\% on average and surpassing strong proprietary systems such as GPT-5 on multiple tasks, including VSP and Jigsaw.
PDF383January 29, 2026