PhysCtrl:基於物理的可控視頻生成框架
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
September 24, 2025
作者: Chen Wang, Chuhao Chen, Yiming Huang, Zhiyang Dou, Yuan Liu, Jiatao Gu, Lingjie Liu
cs.AI
摘要
現有的視頻生成模型在從文本或圖像生成逼真視頻方面表現出色,但往往缺乏物理合理性和三維可控性。為克服這些限制,我們提出了PhysCtrl,這是一個基於物理參數和力控制的新型圖像到視頻生成框架。其核心是一個生成物理網絡,該網絡通過基於物理參數和施加力的擴散模型,學習四種材料(彈性體、沙子、橡皮泥和剛體)的物理動力學分佈。我們將物理動力學表示為三維點軌跡,並在由物理模擬器生成的大規模合成數據集(包含55萬個動畫)上進行訓練。我們通過一種新穎的時空注意力塊增強了擴散模型,該塊模擬粒子相互作用,並在訓練過程中融入基於物理的約束,以確保物理合理性。實驗表明,PhysCtrl生成的現實且基於物理的運動軌跡,在驅動圖像到視頻模型時,能夠產生高保真、可控的視頻,在視覺質量和物理合理性方面均優於現有方法。項目頁面:https://cwchenwang.github.io/physctrl
English
Existing video generation models excel at producing photo-realistic videos
from text or images, but often lack physical plausibility and 3D
controllability. To overcome these limitations, we introduce PhysCtrl, a novel
framework for physics-grounded image-to-video generation with physical
parameters and force control. At its core is a generative physics network that
learns the distribution of physical dynamics across four materials (elastic,
sand, plasticine, and rigid) via a diffusion model conditioned on physics
parameters and applied forces. We represent physical dynamics as 3D point
trajectories and train on a large-scale synthetic dataset of 550K animations
generated by physics simulators. We enhance the diffusion model with a novel
spatiotemporal attention block that emulates particle interactions and
incorporates physics-based constraints during training to enforce physical
plausibility. Experiments show that PhysCtrl generates realistic,
physics-grounded motion trajectories which, when used to drive image-to-video
models, yield high-fidelity, controllable videos that outperform existing
methods in both visual quality and physical plausibility. Project Page:
https://cwchenwang.github.io/physctrl