AlphaApollo：基盤モデルと専門ツールを自律進化型システムへと統合し、深層エージェント推論を実現する

要旨

我々はAlphaApolloを紹介する。これは、基盤モデル（FM）の推論における2つのボトルネック、すなわちモデル固有の能力の限界と信頼性の低いテスト時の反復を解決することを目指す自己進化型エージェント推論システムである。AlphaApolloは、複数のモデルと専門的なツールを統合し、慎重かつ検証可能な推論を実現する。具体的には、(i)計算ツール（数値および記号ライブラリを備えたPython）と(ii)検索ツール（タスク関連の外部情報）を組み合わせて、正確な計算と根拠に基づいた意思決定を実行する。本システムはさらに、候補、実行可能なチェック、反復的な改善のためのフィードバックを記録する共有状態マップを通じて、複数回の反復と複数モデルによるソリューションの進化をサポートする。AIME 2024/2025における複数モデルでの評価では、AlphaApolloは一貫した向上を示した：Qwen2.5-14B-InstructではAverage@32で+5.15%、Pass@32で+23.34%、Llama-3.3-70B-InstructではAverage@32で+8.91%、Pass@32で+26.67%の改善を達成した。ツール使用の分析では、80%以上のツール呼び出しが成功裏に実行され、非ツールベースラインを一貫して上回り、FMの能力の上限を引き上げた。さらなる実証結果と実装の詳細はhttps://github.com/tmlr-group/AlphaApolloにて更新される予定である。

English

We present AlphaApollo, a self-evolving agentic reasoning system that aims to address two bottlenecks in foundation model (FM) reasoning-limited model-intrinsic capacity and unreliable test-time iteration. AlphaApollo orchestrates multiple models with professional tools to enable deliberate, verifiable reasoning. It couples (i) a computation tool (Python with numerical and symbolic libraries) and (ii) a retrieval tool (task-relevant external information) to execute exact calculations and ground decisions. The system further supports multi-round, multi-model solution evolution via a shared state map that records candidates, executable checks, and feedback for iterative refinement. In evaluations on AIME 2024/2025 across multiple models, AlphaApollo delivers consistent gains: +5.15% Average@32 and +23.34% Pass@32 for Qwen2.5-14B-Instruct, and +8.91% Average@32 with +26.67% Pass@32 for Llama-3.3-70B-Instruct. Tool-use analysis shows that more than 80% of tool calls are successfully executed, with consistent outperformance of non-tool baselines, thereby lifting the capability ceiling of FMs. More empirical results and implementation details will be updated at https://github.com/tmlr-group/AlphaApollo.

AlphaApollo：基盤モデルと専門ツールを自律進化型システムへと統合し、深層エージェント推論を実現する

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

要旨

Support