SAGE：面向具身AI的可扩展智能体三维场景生成系统

摘要

面向具身智能体的真实世界数据采集成本高昂且存在安全隐患，亟需可扩展、高仿真且适配仿真器的三维环境。然而现有场景生成系统多依赖基于规则或特定任务的流程，易产生伪影和物理无效场景。我们提出SAGE智能体框架，在给定用户指定的具身任务（如“拿起碗放在桌上”）后，该系统能理解意图并自动生成可投入仿真的规模化环境。该智能体耦合了布局与物体组合的多重生成器，以及评估语义合理性、视觉真实性和物理稳定性的批判模块。通过迭代推理与自适应工具选择，它能持续优化场景直至满足用户意图与物理有效性。生成的环境兼具真实性、多样性，可直接部署于现代仿真器进行策略训练。仅基于本数据训练的智能策略展现出明显的规模扩展趋势，并能泛化至未见过的物体与布局，印证了仿真驱动规模化对具身人工智能的应用价值。代码、演示及SAGE-10k数据集详见项目页面：https://nvlabs.github.io/sage。

English

Real-world data collection for embodied agents remains costly and unsafe, calling for scalable, realistic, and simulator-ready 3D environments. However, existing scene-generation systems often rely on rule-based or task-specific pipelines, yielding artifacts and physically invalid scenes. We present SAGE, an agentic framework that, given a user-specified embodied task (e.g., "pick up a bowl and place it on the table"), understands the intent and automatically generates simulation-ready environments at scale. The agent couples multiple generators for layout and object composition with critics that evaluate semantic plausibility, visual realism, and physical stability. Through iterative reasoning and adaptive tool selection, it self-refines the scenes until meeting user intent and physical validity. The resulting environments are realistic, diverse, and directly deployable in modern simulators for policy training. Policies trained purely on this data exhibit clear scaling trends and generalize to unseen objects and layouts, demonstrating the promise of simulation-driven scaling for embodied AI. Code, demos, and the SAGE-10k dataset can be found on the project page here: https://nvlabs.github.io/sage.

SAGE：面向具身AI的可扩展智能体三维场景生成系统

SAGE: Scalable Agentic 3D Scene Generation for Embodied AI

摘要

Support