ChatPaper.aiChatPaper

Agent-World:为演进通用智能体智能而扩展现实世界环境合成

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

April 20, 2026
作者: Guanting Dong, Junting Lu, Junjie Huang, Wanjun Zhong, Longxiang Liu, Shijue Huang, Zhenyu Li, Yang Zhao, Xiaoshuai Song, Xiaoxi Li, Jiajie Jin, Yutao Zhu, Hanbin Wang, Fangyu Lei, Qinyu Luo, Mingyang Chen, Zehui Chen, Jiazhan Feng, Ji-Rong Wen, Zhicheng Dou
cs.AI

摘要

大型语言模型正日益被期望作为通用智能体,与外部具状态的工具环境进行交互。模型上下文协议(MCP)及更广泛的智能体技能为连接智能体与可扩展的现实世界服务提供了统一接口,但缺乏真实环境与终身学习机制的问题仍制约着鲁棒智能体的训练。本文提出Agent-World——一个通过可扩展环境推进通用智能体智能发展的自进化训练平台。该平台包含两大核心组件:(1)智能化的环境-任务发现机制,能够从数千个现实世界环境主题中自主探索主题对齐的数据库与可执行工具生态,并生成难度可控的可验证任务;(2)持续自进化的智能体训练系统,将多环境强化学习与自进化竞技场相结合,通过动态任务合成自动识别能力短板并驱动针对性学习,实现智能体策略与环境的协同进化。在23项具挑战性的智能体基准测试中,Agent-World的80亿参数和140亿参数版本均持续超越强力的专有模型及环境扩展基线。进一步分析揭示了环境多样性与自进化轮次相关的扩展规律,为构建通用智能体智能提供了重要启示。
English
Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limited by the lack of realistic environments and principled mechanisms for life-long learning. In this paper, we present Agent-World, a self-evolving training arena for advancing general agent intelligence through scalable environments. Agent-World has two main components: (1) Agentic Environment-Task Discovery, which autonomously explores topic-aligned databases and executable tool ecosystems from thousands of real-world environment themes and synthesizes verifiable tasks with controllable difficulty; and (2) Continuous Self-Evolving Agent Training, which combines multi-environment reinforcement learning with a self-evolving agent arena that automatically identifies capability gaps through dynamic task synthesis and drives targeted learning, enabling the co-evolution of agent policies and environments. Across 23 challenging agent benchmarks, Agent-World-8B and 14B consistently outperforms strong proprietary models and environment scaling baselines. Further analyses reveal scaling trends in relation to environment diversity and self-evolution rounds, offering insights for building general agent intelligence.
PDF663April 22, 2026