다중 모드 추론에서 무경험 재사용을 통해 도구 사용자에서 창조자로의 진화

초록

기존 도구 통합 추론(TIR) 모델은 외부 도구를 활용하여 LLM의 질의응답 능력을 효과적으로 확장해왔습니다. 그러나 현실 세계의 다양한 개방형 문제 상황에서는 고정된 도구로는 작업 요구사항을 충족하기 어려운 경우가 빈번합니다. 더욱이 자기 최적화 메커니즘이 부재하여 도구의 오류 출력이 LLM의 응답을 왜곡할 수 있습니다. 또한 기존 도구 구축에는 상당한 수작업이 필요하므로 적용 범위가 제한됩니다. 본 연구는 LLM의 추론 흔적이 내재된 문제 해결 능력을 함축하고 있다는 점에 주목하여, 에이전트를 단순 도구 사용자에서 도구 창조자로 전환하는 새로운 학습 없는 프레임워크인 UCT를 제안합니다. 이 접근법은 추론 경험을 수집하여 재사용 가능한 자산으로 정제합니다. 이를 통해 에이전트는 추론 과정에서 적응형 도구 생성과 자기 업데이트가 가능해집니다. 또한 도구 라이브러리 관리를 위한 메모리 통합 메커니즘을 도입하여 후속 추론 작업에 대한 경험적 메모리의 높은 재사용성을 보장합니다. 이러한 새로운 자동화 도구 구축 패러다임은 추론 과정에서 도구 품질을 지속적으로 개선함으로써 추가 학습 없이 전체 에이전트 시스템의 진화를 가능하게 합니다. 다양한 실험을 통해 본 방법이 TIR 모델의 능력 향상을 위한 새로운 패러다임으로 기능함을 입증했습니다. 특히 다중 도메인 수학 및 과학 추론 벤치마크에서 각각 +20.86%↑, +23.04%↑의显著한 성능 향상을 달성하여 에이전트의 자기 진화 능력을 검증했습니다.

English

Existing Tool-Integrated Reasoning (TIR) models have effectively extended the question-answering capabilities of LLMs by incorporating external tools. However, real-world scenarios present numerous open-ended problems where fixed tools often fail to meet task requirements. Furthermore, the lack of self-optimization mechanisms means that erroneous tool outputs can mislead the LLM's responses. Additionally, the construction of existing tools entails significant manual effort, which consequently constrains their applicability. Recognizing that the reasoning traces of LLMs encapsulate implicit problem-solving capabilities, we propose UCT, a novel training-free framework that transforms agents from tool users to tool creators. This approach harvests reasoning experiences and distills them into reusable assets. This method transforms the agent from a mere tool user into a tool creator, enabling adaptive tool creation and self-updating during the inference process. We also introduce a memory consolidation mechanism to maintain the tool library, ensuring high reusability of retained experiential memory for subsequent reasoning tasks. This novel automated tool construction paradigm continuously improves tool quality during reasoning, allowing the overall agent system to progress without additional training. Extensive experiments demonstrate that our method serves as a novel paradigm for enhancing the capabilities of TIR models. In particular, the significant performance gains achieved +20.86%uparrow and +23.04%uparrow on benchmarks across multi-domain mathematical and scientific reasoning tasks validate the self-evolving capability of the agent.

다중 모드 추론에서 무경험 재사용을 통해 도구 사용자에서 창조자로의 진화

Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning

초록

Support