ChatPaper.aiChatPaper

RealWonder:实时物理动作条件化视频生成

RealWonder: Real-Time Physical Action-Conditioned Video Generation

March 5, 2026
作者: Wei Liu, Ziyu Chen, Zizhang Li, Yue Wang, Hong-Xing Yu, Jiajun Wu
cs.AI

摘要

当前视频生成模型因缺乏对动作如何影响三维场景的结构性理解,无法模拟三维动作的物理后果(如作用力与机器人操控)。我们推出RealWonder——首个基于单张图像实现动作条件视频生成的实时系统。其核心创新在于将物理模拟作为中间桥梁:通过物理引擎将连续动作转化为视频模型可处理的光流与RGB视觉表征,而非直接编码动作。该系统集成三大模块:单图像三维重建、物理模拟、以及仅需4步扩散的蒸馏视频生成器。在480×832分辨率下可实现13.2帧/秒的实时生成,支持对刚性物体、可变形体、流体及颗粒材料进行作用力交互、机器人操作和摄像机控制的动态探索。我们展望RealWonder将为视频模型在沉浸式体验、AR/VR及机器人学习等领域开辟新路径。代码与模型权重已公开于项目网站:https://liuwei283.github.io/RealWonder/
English
Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding of how actions affect 3D scenes. We present RealWonder, the first real-time system for action-conditioned video generation from a single image. Our key insight is using physics simulation as an intermediate bridge: instead of directly encoding continuous actions, we translate them through physics simulation into visual representations (optical flow and RGB) that video models can process. RealWonder integrates three components: 3D reconstruction from single images, physics simulation, and a distilled video generator requiring only 4 diffusion steps. Our system achieves 13.2 FPS at 480x832 resolution, enabling interactive exploration of forces, robot actions, and camera controls on rigid objects, deformable bodies, fluids, and granular materials. We envision RealWonder opens new opportunities to apply video models in immersive experiences, AR/VR, and robot learning. Our code and model weights are publicly available in our project website: https://liuwei283.github.io/RealWonder/
PDF91March 9, 2026