Dream2Flow:通过3D物体流连接视频生成与开放世界操控
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow
December 31, 2025
作者: Karthik Dharmarajan, Wenlong Huang, Jiajun Wu, Li Fei-Fei, Ruohan Zhang
cs.AI
摘要
生成式视频建模已成为一种强大的工具,能够对开放世界操控中的合理物理交互进行零样本推理。然而,如何将这类人类引导的运动转化为机器人系统所需的底层动作仍具挑战。我们发现,给定初始图像和任务指令时,这些模型能出色地合成合理的物体运动。为此,我们提出Dream2Flow框架,通过三维物体流作为中间表征连接视频生成与机器人控制。该方法从生成视频中重建三维物体运动,并将操控问题转化为物体轨迹跟踪任务。通过将状态变化与实现变化的执行机构分离,Dream2Flow克服了本体差异,使预训练视频模型能零样本指导多类别物体(包括刚性体、铰接体、可变形体和颗粒体)的操控。通过轨迹优化或强化学习,该方法无需任务特定示教即可将重建的三维物体流转化为可执行的底层指令。仿真与真实实验表明,三维物体流是适配视频生成模型至开放世界机器人操控的通用可扩展接口。视频及可视化结果详见https://dream2flow.github.io/。
English
Generative video modeling has emerged as a compelling tool to zero-shot reason about plausible physical interactions for open-world manipulation. Yet, it remains a challenge to translate such human-led motions into the low-level actions demanded by robotic systems. We observe that given an initial image and task instruction, these models excel at synthesizing sensible object motions. Thus, we introduce Dream2Flow, a framework that bridges video generation and robotic control through 3D object flow as an intermediate representation. Our method reconstructs 3D object motions from generated videos and formulates manipulation as object trajectory tracking. By separating the state changes from the actuators that realize those changes, Dream2Flow overcomes the embodiment gap and enables zero-shot guidance from pre-trained video models to manipulate objects of diverse categories-including rigid, articulated, deformable, and granular. Through trajectory optimization or reinforcement learning, Dream2Flow converts reconstructed 3D object flow into executable low-level commands without task-specific demonstrations. Simulation and real-world experiments highlight 3D object flow as a general and scalable interface for adapting video generation models to open-world robotic manipulation. Videos and visualizations are available at https://dream2flow.github.io/.