PhyRPR：免训练物理约束视频生成

摘要

基于扩散模型的视频生成方法虽能合成视觉逼真的视频，却常难以满足物理约束。究其根源，现有方案多停留于单阶段范式：将高层物理理解与底层视觉合成相耦合，导致生成需显式物理推理的内容时存在困难。为突破此局限，我们提出免训练的三阶段流程PhyRPR：物理推理—物理规划—物理优化，实现物理理解与视觉合成的解耦。具体而言，物理推理阶段采用大型多模态模型进行物理状态推演，并配合图像生成器合成关键帧；物理规划阶段通过确定性方法生成可控的粗粒度运动骨架；物理优化阶段则通过隐空间融合策略将运动骨架注入扩散采样过程，在保持规划动态的同时优化外观表现。这种分阶段设计使生成过程具备显式物理控制能力。在物理约束条件下的广泛实验表明，本方法能持续提升生成内容的物理合理性与运动可控性。

English

Recent diffusion-based video generation models can synthesize visually plausible videos, yet they often struggle to satisfy physical constraints. A key reason is that most existing approaches remain single-stage: they entangle high-level physical understanding with low-level visual synthesis, making it hard to generate content that require explicit physical reasoning. To address this limitation, we propose a training-free three-stage pipeline,PhyRPR:Phy\uline{Reason}--Phy\uline{Plan}--Phy\uline{Refine}, which decouples physical understanding from visual synthesis. Specifically, PhyReason uses a large multimodal model for physical state reasoning and an image generator for keyframe synthesis; PhyPlan deterministically synthesizes a controllable coarse motion scaffold; and PhyRefine injects this scaffold into diffusion sampling via a latent fusion strategy to refine appearance while preserving the planned dynamics. This staged design enables explicit physical control during generation. Extensive experiments under physics constraints show that our method consistently improves physical plausibility and motion controllability.

PhyRPR：免训练物理约束视频生成

PhyRPR: Training-Free Physics-Constrained Video Generation

摘要

Support