ChatPaper.aiChatPaper

Alita:通用型智能体——以最小预定义与最大自我进化实现可扩展的代理推理

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

May 26, 2025
作者: Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, Mengdi Wang
cs.AI

摘要

近期,大型语言模型(LLMs)的进展使得智能体能够自主执行复杂且开放性的任务。然而,许多现有框架严重依赖于手动预定义的工具和工作流程,这限制了它们的适应性、可扩展性及跨领域的泛化能力。在本研究中,我们推出了Alita——一款秉持“简约即终极复杂”原则设计的通用智能体,通过最小化预定义与最大化自我进化,实现了可扩展的智能推理。在最小化预定义方面,Alita仅配备了一个直接解决问题的组件,相较于以往依赖精心手工打造工具和工作流程的方法,其设计更为简洁明了。这种纯净的设计增强了其应对复杂问题的泛化潜力,不受工具限制。在最大化自我进化方面,我们通过提供一套通用组件,使Alita能够自主构建、优化并复用外部能力,通过从开源资源生成任务相关的模型上下文协议(MCPs),从而促进可扩展的智能推理。值得注意的是,Alita在GAIA基准验证数据集上实现了75.15%的pass@1和87.27%的pass@3准确率,在通用智能体中名列前茅;在Mathvista和PathVQA上分别达到了74.00%和52.00%的pass@1准确率,超越了众多复杂度更高的智能体系统。更多详情将持续更新于https://github.com/CharlesQ9/Alita。
English
Recent advances in large language models (LLMs) have enabled agents to autonomously perform complex, open-ended tasks. However, many existing frameworks depend heavily on manually predefined tools and workflows, which hinder their adaptability, scalability, and generalization across domains. In this work, we introduce Alita--a generalist agent designed with the principle of "Simplicity is the ultimate sophistication," enabling scalable agentic reasoning through minimal predefinition and maximal self-evolution. For minimal predefinition, Alita is equipped with only one component for direct problem-solving, making it much simpler and neater than previous approaches that relied heavily on hand-crafted, elaborate tools and workflows. This clean design enhances its potential to generalize to challenging questions, without being limited by tools. For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning. Notably, Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset, 74.00% and 52.00% pass@1, respectively, on Mathvista and PathVQA, outperforming many agent systems with far greater complexity. More details will be updated at https://github.com/CharlesQ9/Alita{https://github.com/CharlesQ9/Alita}.

Summary

AI-Generated Summary

PDF64May 28, 2025