ChatPaper.aiChatPaper

Cosmos-Transfer1:基於自適應多模態控制的條件式世界生成

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

March 18, 2025
作者: NVIDIA, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng
cs.AI

摘要

我們推出Cosmos-Transfer,這是一個條件式世界生成模型,能夠基於多種空間控制輸入(如分割、深度和邊緣等不同模態)來生成世界模擬。在設計上,該空間條件方案具有自適應性和可定制性,允許在不同空間位置對各類條件輸入進行差異化權重分配。這一特性實現了高度可控的世界生成,並在多種世界到世界轉換應用場景中發揮作用,包括Sim2Real(模擬到現實)。我們進行了廣泛的評估,以分析所提出的模型,並展示其在物理AI領域的應用,如機器人Sim2Real和自動駕駛車輛數據增強。此外,我們還展示了一種推理擴展策略,利用NVIDIA GB200 NVL72機架實現實時世界生成。為加速該領域的研究發展,我們在https://github.com/nvidia-cosmos/cosmos-transfer1開源了模型和代碼。
English
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the field, we open-source our models and code at https://github.com/nvidia-cosmos/cosmos-transfer1.

Summary

AI-Generated Summary

PDF182March 19, 2025