ChatPaper.aiChatPaper

魔导都市:基于语言驱动美学自适应城市生成技术——可控三维资产与布局系统

MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

November 25, 2025
作者: Zilong Huang, Jun He, Xiaobin Huang, Ziyi Xiong, Yang Luo, Junyan Ye, Weijia Li, Yiping Chen, Ting Han
cs.AI

摘要

生成逼真的三维城市是世界模型、虚拟现实和游戏开发的基础技术,理想的城市场景需同时满足风格多样性、精细粒度与可控性三大要求。然而现有方法难以平衡基于文本生成的创意自由度与显式结构表征带来的对象级编辑能力。我们提出MajutsuCity——一个基于自然语言驱动且具备美学自适应能力的框架,能够合成结构一致且风格多样的三维城市场景。该框架将城市解构为可控布局、资产与材质的组合,通过四阶段流水线实现场景生成。为突破初始生成阶段的控制局限,我们进一步集成MajutsuAgent交互式语言编辑代理,支持五种对象级操作。为实现高真实度可定制场景合成,我们还构建了MajutsuDataset多模态数据集,包含二维语义布局与高度图、多样化三维建筑资产、精选PBR材质与天空盒,各项数据均附带精细标注。同时开发了一套实用评估指标,涵盖结构一致性、场景复杂度、材质保真度与光照氛围等核心维度。大量实验表明,MajutsuCity的布局FID指标较CityDreamer降低83.7%,较CityCraft提升20.1%。本方法在AQS与RDR全部评估项中均居首位,显著超越现有技术。这些结果证实MajutsuCity在几何保真度、风格适应性与语义可控性方面确立了三维城市生成的新标杆。我们期待该框架能为三维城市生成研究开辟新路径。数据集与代码将在https://github.com/LongHZ140516/MajutsuCity 发布。
English
Generating realistic 3D cities is fundamental to world models, virtual reality, and game development, where an ideal urban scene must satisfy both stylistic diversity, fine-grained, and controllability. However, existing methods struggle to balance the creative flexibility offered by text-based generation with the object-level editability enabled by explicit structural representations. We introduce MajutsuCity, a natural language-driven and aesthetically adaptive framework for synthesizing structurally consistent and stylistically diverse 3D urban scenes. MajutsuCity represents a city as a composition of controllable layouts, assets, and materials, and operates through a four-stage pipeline. To extend controllability beyond initial generation, we further integrate MajutsuAgent, an interactive language-grounded editing agent} that supports five object-level operations. To support photorealistic and customizable scene synthesis, we also construct MajutsuDataset, a high-quality multimodal dataset} containing 2D semantic layouts and height maps, diverse 3D building assets, and curated PBR materials and skyboxes, each accompanied by detailed annotations. Meanwhile, we develop a practical set of evaluation metrics, covering key dimensions such as structural consistency, scene complexity, material fidelity, and lighting atmosphere. Extensive experiments demonstrate MajutsuCity reduces layout FID by 83.7% compared with CityDreamer and by 20.1% over CityCraft. Our method ranks first across all AQS and RDR scores, outperforming existing methods by a clear margin. These results confirm MajutsuCity as a new state-of-the-art in geometric fidelity, stylistic adaptability, and semantic controllability for 3D city generation. We expect our framework can inspire new avenues of research in 3D city generation. Our dataset and code will be released at https://github.com/LongHZ140516/MajutsuCity.
PDF82December 1, 2025