经由免训练经验复用的多模态推理:从工具使用者到创造者的演进
Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning
February 2, 2026
作者: Xintian Shen, Jiawei Chen, Lihao Zheng, Hao Ma, Tao Wei, Kun Zhan
cs.AI
摘要
现有工具集成推理(TIR)模型通过引入外部工具有效扩展了大语言模型的问答能力。然而,现实场景中存在大量开放性问题,固定工具往往难以满足任务需求。此外,由于缺乏自优化机制,错误的工具输出可能误导大语言模型的响应。现有工具的构建还需耗费大量人工成本,这进一步制约了其适用性。基于大语言模型的推理轨迹蕴含隐性问题解决能力这一发现,我们提出UCT——一种无需训练的新型框架,将智能体从工具使用者转变为工具创造者。该方法通过积累推理经验并将其提炼为可复用资产,使智能体能在推理过程中实现自适应工具创建与自我更新。我们还引入了记忆巩固机制来维护工具库,确保保留的经验记忆对后续推理任务具有高度可复用性。这种新型自动化工具构建范式能在推理过程中持续提升工具质量,使得整个智能体系统无需额外训练即可持续进化。大量实验表明,我们的方法为增强TIR模型能力提供了新范式。特别是在多领域数学与科学推理任务基准测试中取得的显著性能提升(+20.86%↑和+23.04%↑),验证了智能体的自我进化能力。
English
Existing Tool-Integrated Reasoning (TIR) models have effectively extended the question-answering capabilities of LLMs by incorporating external tools. However, real-world scenarios present numerous open-ended problems where fixed tools often fail to meet task requirements. Furthermore, the lack of self-optimization mechanisms means that erroneous tool outputs can mislead the LLM's responses. Additionally, the construction of existing tools entails significant manual effort, which consequently constrains their applicability. Recognizing that the reasoning traces of LLMs encapsulate implicit problem-solving capabilities, we propose UCT, a novel training-free framework that transforms agents from tool users to tool creators. This approach harvests reasoning experiences and distills them into reusable assets. This method transforms the agent from a mere tool user into a tool creator, enabling adaptive tool creation and self-updating during the inference process. We also introduce a memory consolidation mechanism to maintain the tool library, ensuring high reusability of retained experiential memory for subsequent reasoning tasks. This novel automated tool construction paradigm continuously improves tool quality during reasoning, allowing the overall agent system to progress without additional training. Extensive experiments demonstrate that our method serves as a novel paradigm for enhancing the capabilities of TIR models. In particular, the significant performance gains achieved +20.86%uparrow and +23.04%uparrow on benchmarks across multi-domain mathematical and scientific reasoning tasks validate the self-evolving capability of the agent.