ChatPaper.aiChatPaper

Agent-World:為演化通用智能體擴展真實世界環境合成

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

April 20, 2026
作者: Guanting Dong, Junting Lu, Junjie Huang, Wanjun Zhong, Longxiang Liu, Shijue Huang, Zhenyu Li, Yang Zhao, Xiaoshuai Song, Xiaoxi Li, Jiajie Jin, Yutao Zhu, Hanbin Wang, Fangyu Lei, Qinyu Luo, Mingyang Chen, Zehui Chen, Jiazhan Feng, Ji-Rong Wen, Zhicheng Dou
cs.AI

摘要

大型語言模型正日益被期望成為能與外部具狀態工具環境互動的通用智慧體。模型上下文協定(MCP)與更廣泛的智慧體技能為連接智慧體與可擴展現實服務提供了統一介面,但由於缺乏真實環境與系統化的終身學習機制,訓練魯棒智慧體仍面臨侷限。本文提出Agent-World——一個通過可擴展環境推動通用智慧體智慧發展的自演化訓練競技場。該系統包含兩大核心組件:(1)智慧化環境任務發現機制,能自主探索數千個真實世界環境主題中的主題對齊資料庫與可執行工具生態,並合成具有可控制難度且可驗證的任務;(2)持續自演化智慧體訓練框架,結合多環境強化學習與自演化競技場,通過動態任務合成自動識別能力缺口並驅動針對性學習,實現智慧體策略與環境的協同演化。在23項具有挑戰性的智慧體基準測試中,Agent-World-8B和14B模型持續超越強大的專有模型與環境擴展基線。進一步分析揭示了環境多樣性與自演化輪次相關的擴展規律,為構建通用智慧體智慧提供了重要啟示。
English
Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limited by the lack of realistic environments and principled mechanisms for life-long learning. In this paper, we present Agent-World, a self-evolving training arena for advancing general agent intelligence through scalable environments. Agent-World has two main components: (1) Agentic Environment-Task Discovery, which autonomously explores topic-aligned databases and executable tool ecosystems from thousands of real-world environment themes and synthesizes verifiable tasks with controllable difficulty; and (2) Continuous Self-Evolving Agent Training, which combines multi-environment reinforcement learning with a self-evolving agent arena that automatically identifies capability gaps through dynamic task synthesis and drives targeted learning, enabling the co-evolution of agent policies and environments. Across 23 challenging agent benchmarks, Agent-World-8B and 14B consistently outperforms strong proprietary models and environment scaling baselines. Further analyses reveal scaling trends in relation to environment diversity and self-evolution rounds, offering insights for building general agent intelligence.
PDF663April 22, 2026