PhysCtrl：基於物理的可控視頻生成框架

摘要

現有的視頻生成模型在從文本或圖像生成逼真視頻方面表現出色，但往往缺乏物理合理性和三維可控性。為克服這些限制，我們提出了PhysCtrl，這是一個基於物理參數和力控制的新型圖像到視頻生成框架。其核心是一個生成物理網絡，該網絡通過基於物理參數和施加力的擴散模型，學習四種材料（彈性體、沙子、橡皮泥和剛體）的物理動力學分佈。我們將物理動力學表示為三維點軌跡，並在由物理模擬器生成的大規模合成數據集（包含55萬個動畫）上進行訓練。我們通過一種新穎的時空注意力塊增強了擴散模型，該塊模擬粒子相互作用，並在訓練過程中融入基於物理的約束，以確保物理合理性。實驗表明，PhysCtrl生成的現實且基於物理的運動軌跡，在驅動圖像到視頻模型時，能夠產生高保真、可控的視頻，在視覺質量和物理合理性方面均優於現有方法。項目頁面：https://cwchenwang.github.io/physctrl

English

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Project Page: https://cwchenwang.github.io/physctrl

PhysCtrl：基於物理的可控視頻生成框架

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

摘要

Support