ChatPaper.aiChatPaper

世界构筑:基于文本生成可视化世界的智能体框架

World Craft: Agentic Framework to Create Visualizable Worlds via Text

January 14, 2026
作者: Jianwen Sun, Yukang Feng, Kaining Ying, Chuanhao Li, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Yifan Chang, Yu Dai, Yifei Huang, Kaipeng Zhang
cs.AI

摘要

大型语言模型(LLMs)推动了生成式智能体模拟(如AI Town)的发展,以构建“动态世界”,在娱乐和研究领域具有巨大价值。然而对于非专业人士,特别是缺乏编程技能的用户而言,自行定制可视化环境存在困难。本文提出World Craft框架——一种通过用户文本描述创建可执行、可视化AI Town的智能世界构建方案。该框架包含两大核心模块:World Scaffold与World Guild。World Scaffold通过结构化、简洁的标准化方案开发交互式游戏场景,为LLMs定制可执行的类AI Town环境提供高效支撑;World Guild则采用多智能体框架逐步解析用户粗略描述中的意图,并为World Scaffold合成所需的结构化内容(如环境布局与资源)。此外,我们通过逆向工程构建高质量纠错数据集以增强空间知识,提升布局生成的稳定性与可控性,同时报告多维评估指标以供深度分析。大量实验表明,本框架在场景构建与叙事意图传达方面显著优于现有商业代码智能体(Cursor和Antigravity)及大语言模型(Qwen3和Gemini-3-Pro),为环境创建的普适化提供了可扩展的解决方案。
English
Large Language Models (LLMs) motivate generative agent simulation (e.g., AI Town) to create a ``dynamic world'', holding immense value across entertainment and research. However, for non-experts, especially those without programming skills, it isn't easy to customize a visualizable environment by themselves. In this paper, we introduce World Craft, an agentic world creation framework to create an executable and visualizable AI Town via user textual descriptions. It consists of two main modules, World Scaffold and World Guild. World Scaffold is a structured and concise standardization to develop interactive game scenes, serving as an efficient scaffolding for LLMs to customize an executable AI Town-like environment. World Guild is a multi-agent framework to progressively analyze users' intents from rough descriptions, and synthesizes required structured contents (\eg environment layout and assets) for World Scaffold . Moreover, we construct a high-quality error-correction dataset via reverse engineering to enhance spatial knowledge and improve the stability and controllability of layout generation, while reporting multi-dimensional evaluation metrics for further analysis. Extensive experiments demonstrate that our framework significantly outperforms existing commercial code agents (Cursor and Antigravity) and LLMs (Qwen3 and Gemini-3-Pro). in scene construction and narrative intent conveyance, providing a scalable solution for the democratization of environment creation.
PDF152January 29, 2026