AlphaApollo:将基础模型与专业工具整合为自演进系统,实现深度代理推理
AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning
October 5, 2025
作者: Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Linrui Xu, Tian Cheng, Guanyu Jiang, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han
cs.AI
摘要
我们推出AlphaApollo,一个旨在解决基础模型(FM)推理中两大瓶颈——有限模型内在能力与不可靠测试时迭代——的自进化代理推理系统。AlphaApollo通过协调多个模型与专业工具,实现了深思熟虑且可验证的推理过程。它结合了(i)计算工具(配备数值与符号库的Python)和(ii)检索工具(任务相关的外部信息),以执行精确计算并确保决策的落地。该系统进一步通过共享状态地图支持多轮次、多模型的解决方案演进,该地图记录了候选方案、可执行检查及迭代优化的反馈。在AIME 2024/2025的评估中,针对多个模型,AlphaApollo展现了稳定的性能提升:Qwen2.5-14B-Instruct模型在Average@32指标上提升了5.15%,Pass@32指标上提升了23.34%;Llama-3.3-70B-Instruct模型在Average@32指标上提升了8.91%,Pass@32指标上提升了26.67%。工具使用分析显示,超过80%的工具调用成功执行,持续超越非工具基线,从而提升了基础模型的能力上限。更多实证结果与实现细节将更新于https://github.com/tmlr-group/AlphaApollo。
English
We present AlphaApollo, a self-evolving agentic reasoning system that aims to
address two bottlenecks in foundation model (FM) reasoning-limited
model-intrinsic capacity and unreliable test-time iteration. AlphaApollo
orchestrates multiple models with professional tools to enable deliberate,
verifiable reasoning. It couples (i) a computation tool (Python with numerical
and symbolic libraries) and (ii) a retrieval tool (task-relevant external
information) to execute exact calculations and ground decisions. The system
further supports multi-round, multi-model solution evolution via a shared state
map that records candidates, executable checks, and feedback for iterative
refinement. In evaluations on AIME 2024/2025 across multiple models,
AlphaApollo delivers consistent gains: +5.15% Average@32 and +23.34% Pass@32
for Qwen2.5-14B-Instruct, and +8.91% Average@32 with +26.67% Pass@32 for
Llama-3.3-70B-Instruct. Tool-use analysis shows that more than 80% of tool
calls are successfully executed, with consistent outperformance of non-tool
baselines, thereby lifting the capability ceiling of FMs. More empirical
results and implementation details will be updated at
https://github.com/tmlr-group/AlphaApollo.