マルチモーダル推論におけるトレーニング不要な経験再利用による道具利用者から創造者への進化

要旨

既存のツール統合理論（TIR）モデルは、外部ツールを組み込むことでLLMの質問応答能力を効果的に拡張してきた。しかし、現実世界のシナリオでは、固定化されたツールでは課題要求を満たせないオープンエンドな問題が数多く存在する。さらに、自己最適化メカニズムの欠如により、ツールの誤った出力がLLMの応答を誤誘導する可能性がある。加えて、既存ツールの構築には多大な手作業が必要であり、結果として適用範囲が制限されている。LLMの推論軌跡が暗黙的な問題解決能力を内包していることに着目し、本論文ではUCTを提案する。これはエージェントをツール利用者からツール創造者へと転換する、訓練不要の新規フレームワークである。このアプローチは推論経験を収穫し、再利用可能な資産へと蒸留する。推論過程において適応的なツール作成と自己更新を可能とし、単なるツール利用者からツール創造者への変革をもたらす。さらに、ツールライブラリを維持するための記憶統合メカニズムを導入し、保持された経験的記憶の後続推論課題への高い再利用性を保証する。この新規の自動ツール構築パラダイムは、推論中にツール品質を継続的に改善し、追加の訓練なしでエージェントシステム全体を進化させる。大規模実験により、本手法がTIRモデルの能力強化における新たなパラダイムとして機能することを実証した。特に、複数領域の数学的・科学的推論課題におけるベンチマークで達成された顕著な性能向上（+20.86%↑および+23.04%↑）は、エージェントの自己進化能力を裏付けている。

English

Existing Tool-Integrated Reasoning (TIR) models have effectively extended the question-answering capabilities of LLMs by incorporating external tools. However, real-world scenarios present numerous open-ended problems where fixed tools often fail to meet task requirements. Furthermore, the lack of self-optimization mechanisms means that erroneous tool outputs can mislead the LLM's responses. Additionally, the construction of existing tools entails significant manual effort, which consequently constrains their applicability. Recognizing that the reasoning traces of LLMs encapsulate implicit problem-solving capabilities, we propose UCT, a novel training-free framework that transforms agents from tool users to tool creators. This approach harvests reasoning experiences and distills them into reusable assets. This method transforms the agent from a mere tool user into a tool creator, enabling adaptive tool creation and self-updating during the inference process. We also introduce a memory consolidation mechanism to maintain the tool library, ensuring high reusability of retained experiential memory for subsequent reasoning tasks. This novel automated tool construction paradigm continuously improves tool quality during reasoning, allowing the overall agent system to progress without additional training. Extensive experiments demonstrate that our method serves as a novel paradigm for enhancing the capabilities of TIR models. In particular, the significant performance gains achieved +20.86%uparrow and +23.04%uparrow on benchmarks across multi-domain mathematical and scientific reasoning tasks validate the self-evolving capability of the agent.

マルチモーダル推論におけるトレーニング不要な経験再利用による道具利用者から創造者への進化

Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning

要旨

Support