ChatPaper.aiChatPaper

AlphaApollo:將基礎模型與專業工具整合為自我進化系統,實現深度代理推理

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

October 5, 2025
作者: Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Linrui Xu, Tian Cheng, Guanyu Jiang, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han
cs.AI

摘要

我們推出AlphaApollo,這是一個自我進化的代理推理系統,旨在解決基礎模型(FM)推理中的兩個瓶頸:模型內在能力的限制和測試時迭代的不可靠性。AlphaApollo通過協調多個模型與專業工具,實現了深思熟慮且可驗證的推理。它結合了(i)計算工具(配備數值與符號庫的Python)和(ii)檢索工具(任務相關的外部信息)來執行精確計算並基於數據做出決策。該系統進一步支持通過共享狀態圖進行多輪、多模型的解決方案演化,該圖記錄了候選方案、可執行檢查以及用於迭代改進的反饋。在AIME 2024/2025的評估中,針對多個模型,AlphaApollo展現了穩定的性能提升:Qwen2.5-14B-Instruct的Average@32提升了5.15%,Pass@32提升了23.34%;Llama-3.3-70B-Instruct的Average@32提升了8.91%,Pass@32提升了26.67%。工具使用分析顯示,超過80%的工具調用成功執行,且持續超越非工具基線,從而提升了FM的能力上限。更多實證結果與實現細節將更新於https://github.com/tmlr-group/AlphaApollo。
English
We present AlphaApollo, a self-evolving agentic reasoning system that aims to address two bottlenecks in foundation model (FM) reasoning-limited model-intrinsic capacity and unreliable test-time iteration. AlphaApollo orchestrates multiple models with professional tools to enable deliberate, verifiable reasoning. It couples (i) a computation tool (Python with numerical and symbolic libraries) and (ii) a retrieval tool (task-relevant external information) to execute exact calculations and ground decisions. The system further supports multi-round, multi-model solution evolution via a shared state map that records candidates, executable checks, and feedback for iterative refinement. In evaluations on AIME 2024/2025 across multiple models, AlphaApollo delivers consistent gains: +5.15% Average@32 and +23.34% Pass@32 for Qwen2.5-14B-Instruct, and +8.91% Average@32 with +26.67% Pass@32 for Llama-3.3-70B-Instruct. Tool-use analysis shows that more than 80% of tool calls are successfully executed, with consistent outperformance of non-tool baselines, thereby lifting the capability ceiling of FMs. More empirical results and implementation details will be updated at https://github.com/tmlr-group/AlphaApollo.
PDF32October 9, 2025