从工具使用者到创造者：基于免训练经验复用的多模态推理演进

摘要

现有工具集成推理（TIR）模型虽能通过集成外部工具有效扩展大语言模型的问答能力，但现实场景中大量开放性问题往往超出固定工具的处理范围。同时，由于缺乏自优化机制，错误的工具输出易误导大语言模型的决策。此外，现有工具的构建需耗费大量人工成本，制约了其应用广度。基于大语言模型的推理轨迹隐含问题解决能力的洞见，我们提出UCT——一种无需训练的新型框架，将智能体从工具使用者转变为工具创造者。该方法通过积累推理经验并将其提炼为可复用资产，实现推理过程中的自适应工具创建与自我更新。我们还引入了记忆固化机制来维护工具库，确保保留的经验记忆在后续推理任务中具有高复用性。这种创新的自动化工具构建范式能在推理过程中持续提升工具质量，使整体智能体系统无需额外训练即可持续进化。大量实验表明，我们的方法为增强TIR模型能力提供了新范式。在多领域数学与科学推理基准测试中取得的显著性能提升（+20.86%↑和+23.04%↑），尤其验证了智能体的自我进化能力。

English

Existing Tool-Integrated Reasoning (TIR) models have effectively extended the question-answering capabilities of LLMs by incorporating external tools. However, real-world scenarios present numerous open-ended problems where fixed tools often fail to meet task requirements. Furthermore, the lack of self-optimization mechanisms means that erroneous tool outputs can mislead the LLM's responses. Additionally, the construction of existing tools entails significant manual effort, which consequently constrains their applicability. Recognizing that the reasoning traces of LLMs encapsulate implicit problem-solving capabilities, we propose UCT, a novel training-free framework that transforms agents from tool users to tool creators. This approach harvests reasoning experiences and distills them into reusable assets. This method transforms the agent from a mere tool user into a tool creator, enabling adaptive tool creation and self-updating during the inference process. We also introduce a memory consolidation mechanism to maintain the tool library, ensuring high reusability of retained experiential memory for subsequent reasoning tasks. This novel automated tool construction paradigm continuously improves tool quality during reasoning, allowing the overall agent system to progress without additional training. Extensive experiments demonstrate that our method serves as a novel paradigm for enhancing the capabilities of TIR models. In particular, the significant performance gains achieved +20.86%uparrow and +23.04%uparrow on benchmarks across multi-domain mathematical and scientific reasoning tasks validate the self-evolving capability of the agent.

从工具使用者到创造者：基于免训练经验复用的多模态推理演进

Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning

摘要

Support