PhysGen:刚体物理基础的图像到视频生成
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
September 27, 2024
作者: Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, Shenlong Wang
cs.AI
摘要
我们提出了PhysGen,一种新颖的图像到视频生成方法,它将单个图像和输入条件(例如,施加在图像中物体上的力和扭矩)转换为产生逼真、物理合理且时间连贯的视频。我们的关键见解是将基于模型的物理仿真与数据驱动的视频生成过程相结合,实现了可信的图像空间动态。我们系统的核心包括三个主要组件:(i)一个图像理解模块,有效捕捉图像的几何形状、材料和物理参数;(ii)一个利用刚体物理和推断参数进行模拟真实行为的图像空间动态模拟模型;以及(iii)一个利用生成式视频扩散进行图像渲染和细化的模块,以生成展示模拟运动的逼真视频素材。生成的视频在物理和外观上都很逼真,甚至可以精确控制,通过定量比较和全面用户研究展示出优于现有数据驱动图像到视频生成作品的卓越结果。PhysGen生成的视频可用于各种下游应用,例如将图像转换为逼真动画或允许用户与图像进行交互并创建各种动态。项目页面:https://stevenlsw.github.io/physgen/
English
We present PhysGen, a novel image-to-video generation method that converts a
single image and an input condition (e.g., force and torque applied to an
object in the image) to produce a realistic, physically plausible, and
temporally consistent video. Our key insight is to integrate model-based
physical simulation with a data-driven video generation process, enabling
plausible image-space dynamics. At the heart of our system are three core
components: (i) an image understanding module that effectively captures the
geometry, materials, and physical parameters of the image; (ii) an image-space
dynamics simulation model that utilizes rigid-body physics and inferred
parameters to simulate realistic behaviors; and (iii) an image-based rendering
and refinement module that leverages generative video diffusion to produce
realistic video footage featuring the simulated motion. The resulting videos
are realistic in both physics and appearance and are even precisely
controllable, showcasing superior results over existing data-driven
image-to-video generation works through quantitative comparison and
comprehensive user study. PhysGen's resulting videos can be used for various
downstream applications, such as turning an image into a realistic animation or
allowing users to interact with the image and create various dynamics. Project
page: https://stevenlsw.github.io/physgen/Summary
AI-Generated Summary