PhyRPR:免训练的物理约束视频生成
PhyRPR: Training-Free Physics-Constrained Video Generation
January 14, 2026
作者: Yibo Zhao, Hengjia Li, Xiaofei He, Boxi Wu
cs.AI
摘要
当前基于扩散模型的视频生成方法虽能合成视觉逼真的视频,却常难以满足物理约束。关键原因在于多数现有方案仍停留于单阶段模式:它们将高层物理理解与底层视觉合成相耦合,导致难以生成需显式物理推理的内容。为解决这一局限,我们提出无需训练的三阶段流程PhyRPR:物理推理—物理规划—物理优化,实现物理理解与视觉合成的解耦。具体而言,物理推理阶段采用大型多模态模型进行物理状态推理,并借助图像生成器合成关键帧;物理规划阶段通过确定性方法生成可控的粗粒度运动框架;物理优化阶段通过潜在融合策略将该框架注入扩散采样过程,在保持规划动态的同时优化外观。这种分阶段设计使生成过程具备显式物理控制能力。在物理约束下的广泛实验表明,本方法能持续提升生成内容的物理合理性与运动可控性。
English
Recent diffusion-based video generation models can synthesize visually plausible videos, yet they often struggle to satisfy physical constraints. A key reason is that most existing approaches remain single-stage: they entangle high-level physical understanding with low-level visual synthesis, making it hard to generate content that require explicit physical reasoning. To address this limitation, we propose a training-free three-stage pipeline,PhyRPR:Phy\uline{Reason}--Phy\uline{Plan}--Phy\uline{Refine}, which decouples physical understanding from visual synthesis. Specifically, PhyReason uses a large multimodal model for physical state reasoning and an image generator for keyframe synthesis; PhyPlan deterministically synthesizes a controllable coarse motion scaffold; and PhyRefine injects this scaffold into diffusion sampling via a latent fusion strategy to refine appearance while preserving the planned dynamics. This staged design enables explicit physical control during generation. Extensive experiments under physics constraints show that our method consistently improves physical plausibility and motion controllability.