Alita:通用型智能體,以最少預定義與最大自我進化實現可擴展的自主推理
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution
May 26, 2025
作者: Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, Mengdi Wang
cs.AI
摘要
近期大型語言模型(LLMs)的進展使得代理能夠自主執行複雜且開放式的任務。然而,許多現有框架過度依賴於手動預定義的工具和工作流程,這限制了它們的適應性、可擴展性以及跨領域的泛化能力。在本研究中,我們介紹了Alita——一款秉持“簡約即終極複雜”原則設計的通用代理,通過最小化預定義與最大化自我進化來實現可擴展的代理推理。在最小化預定義方面,Alita僅配備了一個直接解決問題的組件,相比以往依賴於精心手工打造的工具和工作流程的方法,其設計更為簡潔明瞭。這種簡潔的設計增強了其應對挑戰性問題的泛化潛力,而不受工具限制。在最大化自我進化方面,我們通過提供一套通用組件,使Alita能夠自主構建、精煉並重用外部能力,通過從開源資源生成與任務相關的模型上下文協議(MCPs),從而促進可擴展的代理推理。值得注意的是,Alita在GAIA基準驗證數據集上達到了75.15%的pass@1和87.27%的pass@3準確率,在通用代理中名列前茅;在Mathvista和PathVQA上分別取得了74.00%和52.00%的pass@1成績,超越了許多複雜度更高的代理系統。更多詳情將更新於https://github.com/CharlesQ9/Alita。
English
Recent advances in large language models (LLMs) have enabled agents to
autonomously perform complex, open-ended tasks. However, many existing
frameworks depend heavily on manually predefined tools and workflows, which
hinder their adaptability, scalability, and generalization across domains. In
this work, we introduce Alita--a generalist agent designed with the principle
of "Simplicity is the ultimate sophistication," enabling scalable agentic
reasoning through minimal predefinition and maximal self-evolution. For minimal
predefinition, Alita is equipped with only one component for direct
problem-solving, making it much simpler and neater than previous approaches
that relied heavily on hand-crafted, elaborate tools and workflows. This clean
design enhances its potential to generalize to challenging questions, without
being limited by tools. For Maximal self-evolution, we enable the creativity of
Alita by providing a suite of general-purpose components to autonomously
construct, refine, and reuse external capabilities by generating task-related
model context protocols (MCPs) from open source, which contributes to scalable
agentic reasoning. Notably, Alita achieves 75.15% pass@1 and 87.27% pass@3
accuracy, which is top-ranking among general-purpose agents, on the GAIA
benchmark validation dataset, 74.00% and 52.00% pass@1, respectively, on
Mathvista and PathVQA, outperforming many agent systems with far greater
complexity. More details will be updated at
https://github.com/CharlesQ9/Alita{https://github.com/CharlesQ9/Alita}.Summary
AI-Generated Summary