Code2Worlds：赋能代码大模型的四维世界生成

摘要

实现空间智能需要超越视觉合理性，构建基于物理定律的世界模拟器。尽管编程大语言模型已推动静态3D场景生成的发展，但将该范式拓展至四维动态生成仍是关键前沿。此任务面临两大根本挑战：多尺度上下文纠缠问题——单一生成模式难以平衡局部物体结构与全局环境布局；语义-物理执行鸿沟问题——开环代码生成会导致缺乏动态保真度的物理幻觉。我们提出Code2Worlds框架，将四维生成建模为语言到模拟代码的生成过程。首先设计双流架构，实现检索增强的物体生成与分层环境编排的解耦；其次建立物理感知的闭环机制，通过后处理代理编写动力学脚本，结合VLM运动批判器进行自我反思以迭代优化模拟代码。在Code4D基准测试中，Code2Worlds以41%的SGS提升和49%的丰富度优势超越基线方法，且能生成静态方法所不具备的物理感知动态效果。代码与项目网站详见：https://github.com/AIGeeksGroup/Code2Worlds 与 https://aigeeksgroup.github.io/Code2Worlds。

English

Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.

Code2Worlds：赋能代码大模型的四维世界生成

Code2Worlds: Empowering Coding LLMs for 4D World Generation

摘要

Support