物理模拟器在环视频生成

摘要

基于扩散模型的视频生成技术近期虽在视觉真实感方面取得显著进展，但仍难以遵循重力、惯性和碰撞等基本物理定律。生成物体常出现帧间运动不一致、运动轨迹不合理或违反物理约束等问题，限制了AI生成视频的真实性与可靠性。为弥补这一不足，我们提出物理模拟器在环视频生成框架（PSIVG），该创新框架将物理模拟器集成至视频扩散过程。PSIVG首先通过预训练扩散模型生成模板视频，继而重建四维场景与前景物体网格，在物理模拟器中初始化这些元素并生成符合物理规律的运动轨迹。这些模拟轨迹随后用于引导视频生成器实现时空物理一致性运动。为进一步提升物体运动过程中的纹理一致性，我们提出测试时纹理一致性优化技术（TTCO），该技术基于模拟器提取的像素对应关系自适应调整文本与特征嵌入。综合实验表明，PSIVG在保持视觉质量与多样性的同时，能生成更符合现实物理规律的视频。项目页面：https://vcai.mpi-inf.mpg.de/projects/PSIVG/

English

Recent advances in diffusion-based video generation have achieved remarkable visual realism but still struggle to obey basic physical laws such as gravity, inertia, and collision. Generated objects often move inconsistently across frames, exhibit implausible dynamics, or violate physical constraints, limiting the realism and reliability of AI-generated videos. We address this gap by introducing Physical Simulator In-the-loop Video Generation (PSIVG), a novel framework that integrates a physical simulator into the video diffusion process. Starting from a template video generated by a pre-trained diffusion model, PSIVG reconstructs the 4D scene and foreground object meshes, initializes them within a physical simulator, and generates physically consistent trajectories. These simulated trajectories are then used to guide the video generator toward spatio-temporally physically coherent motion. To further improve texture consistency during object movement, we propose a Test-Time Texture Consistency Optimization (TTCO) technique that adapts text and feature embeddings based on pixel correspondences from the simulator. Comprehensive experiments demonstrate that PSIVG produces videos that better adhere to real-world physics while preserving visual quality and diversity. Project Page: https://vcai.mpi-inf.mpg.de/projects/PSIVG/

物理模拟器在环视频生成

Physical Simulator In-the-Loop Video Generation

摘要

Support