ChatPaper.aiChatPaper

MAXS:基於大型語言模型代理的元自適應探索

MAXS: Meta-Adaptive Exploration with LLM Agents

January 14, 2026
作者: Jian Zhang, Zhiyuan Wang, Zhangqi Wang, Yu He, Haoran Luo, li yuan, Lingling Zhang, Rui Mao, Qika Lin, Jun Liu
cs.AI

摘要

大型語言模型(LLM)代理通過多工具協作展現出內在的推理能力。然而在代理推理過程中,現有方法常存在兩大問題:(一)因缺乏前瞻性而導致局部短視的生成;(二)軌跡不穩定性,即早期細微誤差可能引發推理路徑發散。這些問題使得全局效能與計算效率難以兼顧。為解決上述挑戰,我們提出基於LLM代理的元自適應探索框架MAXS(https://github.com/exoskeletonzj/MAXS),該框架能靈活整合工具執行與推理規劃。MAXS採用前瞻策略延伸推理路徑數步,預估工具使用的優勢值,並結合步驟一致性方差與跨步趨勢斜率,聯合篩選出穩定、一致且高價值的推理步驟。此外,我們引入軌跡收斂機制,在達成路徑一致性時停止後續推演,通過控制計算成本實現多工具推理中資源效率與全局效能的平衡。我們在三個基礎模型(MiMo-VL-7B、Qwen2.5-VL-7B、Qwen2.5-VL-32B)與五個數據集上進行廣泛實證研究,結果表明MAXS在性能與推理效率上均持續優於現有方法。進一步分析驗證了我們的前瞻策略與工具使用機制的有效性。
English
Large Language Model (LLM) Agents exhibit inherent reasoning abilities through the collaboration of multiple tools. However, during agent inference, existing methods often suffer from (i) locally myopic generation, due to the absence of lookahead, and (ii) trajectory instability, where minor early errors can escalate into divergent reasoning paths. These issues make it difficult to balance global effectiveness and computational efficiency. To address these two issues, we propose meta-adaptive exploration with LLM agents https://github.com/exoskeletonzj/MAXS, a meta-adaptive reasoning framework based on LLM Agents that flexibly integrates tool execution and reasoning planning. MAXS employs a lookahead strategy to extend reasoning paths a few steps ahead, estimating the advantage value of tool usage, and combines step consistency variance and inter-step trend slopes to jointly select stable, consistent, and high-value reasoning steps. Additionally, we introduce a trajectory convergence mechanism that controls computational cost by halting further rollouts once path consistency is achieved, enabling a balance between resource efficiency and global effectiveness in multi-tool reasoning. We conduct extensive empirical studies across three base models (MiMo-VL-7B, Qwen2.5-VL-7B, Qwen2.5-VL-32B) and five datasets, demonstrating that MAXS consistently outperforms existing methods in both performance and inference efficiency. Further analysis confirms the effectiveness of our lookahead strategy and tool usage.
PDF813January 16, 2026