Dream2Flow:通过3D物体流连接视频生成与开放世界操控
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow
December 31, 2025
作者: Karthik Dharmarajan, Wenlong Huang, Jiajun Wu, Li Fei-Fei, Ruohan Zhang
cs.AI
摘要
生成式视频建模已成为一种极具潜力的工具,能够对开放世界操作中的合理物理交互进行零样本推理。然而,如何将这类由人类引导的运动转化为机器人系统所需的底层动作仍是一大挑战。我们发现,给定初始图像和任务指令时,这些模型能出色地合成合理的物体运动。为此,我们提出Dream2Flow框架,通过三维物体流作为中间表征来连接视频生成与机器人控制。该方法从生成视频中重建三维物体运动,并将操作任务转化为物体轨迹跟踪问题。通过将状态变化与实现这些变化的执行器分离,Dream2Flow克服了具身智能差距,使预训练视频模型能对包括刚性体、铰接体、可变形体和颗粒体在内的多类别物体实现零样本引导操作。通过轨迹优化或强化学习,Dream2Flow无需任务特定示范即可将重建的三维物体流转化为可执行的底层指令。仿真与真实环境实验表明,三维物体流是适配视频生成模型至开放世界机器人操作的通用可扩展接口。视频及可视化资料详见https://dream2flow.github.io/。
English
Generative video modeling has emerged as a compelling tool to zero-shot reason about plausible physical interactions for open-world manipulation. Yet, it remains a challenge to translate such human-led motions into the low-level actions demanded by robotic systems. We observe that given an initial image and task instruction, these models excel at synthesizing sensible object motions. Thus, we introduce Dream2Flow, a framework that bridges video generation and robotic control through 3D object flow as an intermediate representation. Our method reconstructs 3D object motions from generated videos and formulates manipulation as object trajectory tracking. By separating the state changes from the actuators that realize those changes, Dream2Flow overcomes the embodiment gap and enables zero-shot guidance from pre-trained video models to manipulate objects of diverse categories-including rigid, articulated, deformable, and granular. Through trajectory optimization or reinforcement learning, Dream2Flow converts reconstructed 3D object flow into executable low-level commands without task-specific demonstrations. Simulation and real-world experiments highlight 3D object flow as a general and scalable interface for adapting video generation models to open-world robotic manipulation. Videos and visualizations are available at https://dream2flow.github.io/.