Cosmos-Transfer1:基于自适应多模态控制的条件化世界生成
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
March 18, 2025
作者: NVIDIA, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng
cs.AI
摘要
我们推出Cosmos-Transfer,这是一种条件式世界生成模型,能够基于多种模态的空间控制输入(如分割、深度和边缘)生成世界模拟。在设计上,该空间条件方案具有自适应性和可定制性,允许在不同空间位置对不同的条件输入赋予不同的权重。这使得世界生成具有高度可控性,并适用于多种世界间转换的应用场景,包括Sim2Real(仿真到现实)。我们进行了广泛的评估,以分析所提出的模型,并展示了其在物理AI领域的应用,包括机器人Sim2Real和自动驾驶数据增强。此外,我们还展示了一种推理扩展策略,利用NVIDIA GB200 NVL72机架实现实时世界生成。为了加速该领域的研究发展,我们在https://github.com/nvidia-cosmos/cosmos-transfer1开源了我们的模型和代码。
English
We introduce Cosmos-Transfer, a conditional world generation model that can
generate world simulations based on multiple spatial control inputs of various
modalities such as segmentation, depth, and edge. In the design, the spatial
conditional scheme is adaptive and customizable. It allows weighting different
conditional inputs differently at different spatial locations. This enables
highly controllable world generation and finds use in various world-to-world
transfer use cases, including Sim2Real. We conduct extensive evaluations to
analyze the proposed model and demonstrate its applications for Physical AI,
including robotics Sim2Real and autonomous vehicle data enrichment. We further
demonstrate an inference scaling strategy to achieve real-time world generation
with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the
field, we open-source our models and code at
https://github.com/nvidia-cosmos/cosmos-transfer1.Summary
AI-Generated Summary