PhysChoreo:基于部件感知语义锚定的物理可控视频生成
PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding
November 25, 2025
作者: Haoze Zhang, Tianyu Huang, Zichen Wan, Xiaowei Jin, Hongzhi Zhang, Hui Li, Wangmeng Zuo
cs.AI
摘要
尽管近期视频生成模型已实现显著的视觉保真度,但其往往缺乏显式的物理可控性与合理性。为解决这一问题,部分研究尝试通过基于物理的渲染技术来引导视频生成。然而,这些方法在精确建模复杂物理属性、以及有效控制长时序中物理行为方面仍存在固有挑战。本研究提出PhysChoreo创新框架,能够从单张图像生成兼具多样化可控性与物理真实感的视频。该方法包含两个阶段:首先通过部件感知的物理属性重建技术估算图像中所有物体的静态初始物理属性;随后通过时序指令与物理可编辑的模拟过程,合成具有丰富动态行为与物理真实感的高质量视频。实验结果表明,PhysChoreo能生成具备丰富行为模式与物理真实感的视频,在多项评估指标上均优于现有先进方法。
English
While recent video generation models have achieved significant visual fidelity, they often suffer from the lack of explicit physical controllability and plausibility. To address this, some recent studies attempted to guide the video generation with physics-based rendering. However, these methods face inherent challenges in accurately modeling complex physical properties and effectively control ling the resulting physical behavior over extended temporal sequences. In this work, we introduce PhysChoreo, a novel framework that can generate videos with diverse controllability and physical realism from a single image. Our method consists of two stages: first, it estimates the static initial physical properties of all objects in the image through part-aware physical property reconstruction. Then, through temporally instructed and physically editable simulation, it synthesizes high-quality videos with rich dynamic behaviors and physical realism. Experimental results show that PhysChoreo can generate videos with rich behaviors and physical realism, outperforming state-of-the-art methods on multiple evaluation metrics.