SAGE:面向具身AI的可扩展智能体三维场景生成系统
SAGE: Scalable Agentic 3D Scene Generation for Embodied AI
February 10, 2026
作者: Hongchi Xia, Xuan Li, Zhaoshuo Li, Qianli Ma, Jiashu Xu, Ming-Yu Liu, Yin Cui, Tsung-Yi Lin, Wei-Chiu Ma, Shenlong Wang, Shuran Song, Fangyin Wei
cs.AI
摘要
面向具身智能体的真实世界数据采集仍存在成本高昂与安全风险,亟需可扩展、高拟真且适配仿真器的三维环境。然而现有场景生成系统多依赖基于规则或任务特定的流程,易产生伪影及物理无效场景。我们提出SAGE——一种智能体框架,可在用户指定具身任务(如“拿起碗放在桌上”)后理解意图,并自动生成规模化、仿真就绪的环境。该智能体耦合了布局与物体组合的多重生成器,以及评估语义合理性、视觉真实感与物理稳定性的评判模块。通过迭代推理与自适应工具选择,系统持续自我优化场景直至满足用户意图与物理有效性。生成的环境兼具真实性与多样性,可直接部署于现代仿真器进行策略训练。基于此数据的纯仿真训练策略展现出显著的比例扩展趋势,并能泛化至未见过的物体与布局,印证了仿真驱动规模化在具身AI领域的潜力。代码、演示及SAGE-10k数据集详见项目页面:https://nvlabs.github.io/sage。
English
Real-world data collection for embodied agents remains costly and unsafe, calling for scalable, realistic, and simulator-ready 3D environments. However, existing scene-generation systems often rely on rule-based or task-specific pipelines, yielding artifacts and physically invalid scenes. We present SAGE, an agentic framework that, given a user-specified embodied task (e.g., "pick up a bowl and place it on the table"), understands the intent and automatically generates simulation-ready environments at scale. The agent couples multiple generators for layout and object composition with critics that evaluate semantic plausibility, visual realism, and physical stability. Through iterative reasoning and adaptive tool selection, it self-refines the scenes until meeting user intent and physical validity. The resulting environments are realistic, diverse, and directly deployable in modern simulators for policy training. Policies trained purely on this data exhibit clear scaling trends and generalize to unseen objects and layouts, demonstrating the promise of simulation-driven scaling for embodied AI. Code, demos, and the SAGE-10k dataset can be found on the project page here: https://nvlabs.github.io/sage.