MAXS：基于大语言模型智能体的元自适应探索

摘要

大型语言模型（LLM）智能体通过多工具协作展现出内在推理能力。然而在智能体推理过程中，现有方法常存在两大问题：（一）因缺乏前瞻性而导致局部短视生成；（二）轨迹不稳定性，即早期微小误差会演变为发散推理路径。这些问题使得全局效能与计算效率难以兼顾。为应对上述挑战，我们提出基于LLM智能体的元自适应探索框架MAXS（https://github.com/exoskeletonzj/MAXS），该框架能灵活整合工具执行与推理规划。MAXS采用前瞻策略延伸多步推理路径，评估工具使用的优势值，并结合步骤一致性方差与步间趋势斜率联合筛选稳定、一致且高价值的推理步骤。此外，我们引入轨迹收敛机制，在达成路径一致性时停止进一步推演，通过控制计算成本实现多工具推理中资源效率与全局效能的平衡。我们在三种基础模型（MiMo-VL-7B、Qwen2.5-VL-7B、Qwen2.5-VL-32B）和五个数据集上开展广泛实验，证明MAXS在性能与推理效率上均持续优于现有方法。进一步分析验证了前瞻策略与工具使用的有效性。

English

Large Language Model (LLM) Agents exhibit inherent reasoning abilities through the collaboration of multiple tools. However, during agent inference, existing methods often suffer from (i) locally myopic generation, due to the absence of lookahead, and (ii) trajectory instability, where minor early errors can escalate into divergent reasoning paths. These issues make it difficult to balance global effectiveness and computational efficiency. To address these two issues, we propose meta-adaptive exploration with LLM agents https://github.com/exoskeletonzj/MAXS, a meta-adaptive reasoning framework based on LLM Agents that flexibly integrates tool execution and reasoning planning. MAXS employs a lookahead strategy to extend reasoning paths a few steps ahead, estimating the advantage value of tool usage, and combines step consistency variance and inter-step trend slopes to jointly select stable, consistent, and high-value reasoning steps. Additionally, we introduce a trajectory convergence mechanism that controls computational cost by halting further rollouts once path consistency is achieved, enabling a balance between resource efficiency and global effectiveness in multi-tool reasoning. We conduct extensive empirical studies across three base models (MiMo-VL-7B, Qwen2.5-VL-7B, Qwen2.5-VL-32B) and five datasets, demonstrating that MAXS consistently outperforms existing methods in both performance and inference efficiency. Further analysis confirms the effectiveness of our lookahead strategy and tool usage.

MAXS：基于大语言模型智能体的元自适应探索

MAXS: Meta-Adaptive Exploration with LLM Agents

摘要

Support