ChatPaper.aiChatPaper

PhysCtrl:基於物理的可控視頻生成框架

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

September 24, 2025
作者: Chen Wang, Chuhao Chen, Yiming Huang, Zhiyang Dou, Yuan Liu, Jiatao Gu, Lingjie Liu
cs.AI

摘要

現有的視頻生成模型在從文本或圖像生成逼真視頻方面表現出色,但往往缺乏物理合理性和三維可控性。為克服這些限制,我們提出了PhysCtrl,這是一個基於物理參數和力控制的新型圖像到視頻生成框架。其核心是一個生成物理網絡,該網絡通過基於物理參數和施加力的擴散模型,學習四種材料(彈性體、沙子、橡皮泥和剛體)的物理動力學分佈。我們將物理動力學表示為三維點軌跡,並在由物理模擬器生成的大規模合成數據集(包含55萬個動畫)上進行訓練。我們通過一種新穎的時空注意力塊增強了擴散模型,該塊模擬粒子相互作用,並在訓練過程中融入基於物理的約束,以確保物理合理性。實驗表明,PhysCtrl生成的現實且基於物理的運動軌跡,在驅動圖像到視頻模型時,能夠產生高保真、可控的視頻,在視覺質量和物理合理性方面均優於現有方法。項目頁面:https://cwchenwang.github.io/physctrl
English
Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl
PDF92September 25, 2025