ChatPaper.aiChatPaper

PhysCtrl:基于生成物理学的可控与物理基础视频生成

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

September 24, 2025
作者: Chen Wang, Chuhao Chen, Yiming Huang, Zhiyang Dou, Yuan Liu, Jiatao Gu, Lingjie Liu
cs.AI

摘要

现有的视频生成模型在从文本或图像生成逼真视频方面表现出色,但往往缺乏物理合理性和三维可控性。为克服这些局限,我们提出了PhysCtrl,一个基于物理参数和力控制的新型图像到视频生成框架。其核心是一个生成物理网络,该网络通过扩散模型学习四种材料(弹性体、沙子、橡皮泥和刚性体)在物理参数和施加力条件下的动态分布。我们将物理动态表示为三维点轨迹,并在由物理模拟器生成的大规模合成数据集(包含55万条动画)上进行训练。我们通过一种新颖的时空注意力模块增强扩散模型,该模块模拟粒子相互作用,并在训练过程中融入基于物理的约束,以确保物理合理性。实验表明,PhysCtrl生成的物理基础运动轨迹真实可信,当用于驱动图像到视频模型时,能够产生高保真、可控的视频,在视觉质量和物理合理性上均优于现有方法。项目页面:https://cwchenwang.github.io/physctrl
English
Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl
PDF92September 25, 2025