ChatPaper.aiChatPaper

ProPhy:面向动态世界模拟的渐进式物理对齐

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

December 5, 2025
作者: Zijun Wang, Panwen Hu, Jing Wang, Terry Jingchen Zhang, Yuhao Cheng, Long Chen, Yiqiang Yan, Zutao Jiang, Hanhui Li, Xiaodan Liang
cs.AI

摘要

近期视频生成技术的突破性进展,在构建世界模拟器方面展现出巨大潜力。然而现有模型在处理大规模或复杂动态场景时,仍难以保证物理一致性。这一局限主要源于现有方法对物理提示的响应存在各向同性特征,且忽视了生成内容与局部物理线索间的细粒度对齐。为解决这些难题,我们提出ProPhy——渐进式物理对齐框架,通过显式物理感知条件化与各向异性生成机制实现突破。该框架采用两阶段物理专家混合机制进行判别式物理先验提取:语义专家从文本描述中推断语义级物理规律,优化专家则捕捉标记级物理动态。这种机制使模型能够学习更符合基础物理定律的细粒度物理感知视频表征。此外,我们引入物理对齐策略,将视觉语言模型的物理推理能力迁移至优化专家,从而更精准地呈现动态物理现象。在物理感知视频生成基准测试上的大量实验表明,ProPhy相较现有最优方法能产生更逼真、动态且物理连贯的结果。
English
Recent advances in video generation have shown remarkable potential for constructing world simulators. However, current models still struggle to produce physically consistent results, particularly when handling large-scale or complex dynamics. This limitation arises primarily because existing approaches respond isotropically to physical prompts and neglect the fine-grained alignment between generated content and localized physical cues. To address these challenges, we propose ProPhy, a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation. ProPhy employs a two-stage Mixture-of-Physics-Experts (MoPE) mechanism for discriminative physical prior extraction, where Semantic Experts infer semantic-level physical principles from textual descriptions, and Refinement Experts capture token-level physical dynamics. This mechanism allows the model to learn fine-grained, physics-aware video representations that better reflect underlying physical laws. Furthermore, we introduce a physical alignment strategy that transfers the physical reasoning capabilities of vision-language models (VLMs) into the Refinement Experts, facilitating a more accurate representation of dynamic physical phenomena. Extensive experiments on physics-aware video generation benchmarks demonstrate that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.
PDF32December 9, 2025