Alita: 最小限の事前定義と最大限の自己進化によるスケーラブルなエージェント的推論を可能にする汎用エージェント

要旨

大規模言語モデル（LLMs）の最近の進展により、エージェントが複雑で開放的なタスクを自律的に実行することが可能となった。しかし、多くの既存のフレームワークは手動で事前定義されたツールやワークフローに大きく依存しており、これが適応性、拡張性、およびドメイン間での汎化を妨げている。本研究では、「シンプルさは究極の洗練である」という原則に基づいて設計された汎用エージェント「Alita」を紹介する。Alitaは、最小限の事前定義と最大限の自己進化を通じて、スケーラブルなエージェント推論を実現する。最小限の事前定義のために、Alitaは直接的な問題解決のためのコンポーネントを1つだけ備えており、手作りの精巧なツールやワークフローに大きく依存する従来のアプローチよりもはるかにシンプルで洗練されている。このクリーンな設計により、ツールに制限されることなく、難しい質問に対しても汎化する可能性が高まる。最大限の自己進化のために、Alitaの創造性を引き出すために、汎用コンポーネントのスイートを提供し、オープンソースからタスク関連のモデルコンテキストプロトコル（MCPs）を生成することで、外部の能力を自律的に構築、改良、再利用することを可能にする。これにより、スケーラブルなエージェント推論が実現される。特に、AlitaはGAIAベンチマーク検証データセットにおいて75.15%のpass@1と87.27%のpass@3の精度を達成し、汎用エージェントの中でトップクラスの性能を示している。また、MathvistaとPathVQAにおいてもそれぞれ74.00%と52.00%のpass@1を達成し、はるかに複雑な多くのエージェントシステムを上回る性能を示している。詳細はhttps://github.com/CharlesQ9/Alita{https://github.com/CharlesQ9/Alita}で更新される予定である。

English

Recent advances in large language models (LLMs) have enabled agents to autonomously perform complex, open-ended tasks. However, many existing frameworks depend heavily on manually predefined tools and workflows, which hinder their adaptability, scalability, and generalization across domains. In this work, we introduce Alita--a generalist agent designed with the principle of "Simplicity is the ultimate sophistication," enabling scalable agentic reasoning through minimal predefinition and maximal self-evolution. For minimal predefinition, Alita is equipped with only one component for direct problem-solving, making it much simpler and neater than previous approaches that relied heavily on hand-crafted, elaborate tools and workflows. This clean design enhances its potential to generalize to challenging questions, without being limited by tools. For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning. Notably, Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset, 74.00% and 52.00% pass@1, respectively, on Mathvista and PathVQA, outperforming many agent systems with far greater complexity. More details will be updated at https://github.com/CharlesQ9/Alita{https://github.com/CharlesQ9/Alita}.