ChatPaper.aiChatPaper

PhyGDPO:面向物理一致性的文本到视频生成——物理感知分组直接偏好优化

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

December 31, 2025
作者: Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Feng Liang, Weifeng Chen, Felix Juefei-Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, Ji Hou
cs.AI

摘要

近期,文本到视频(T2V)生成技术虽在视觉质量上取得显著进展,但生成严格遵循物理规律的真实视频仍面临挑战。现有方法主要基于图形学或提示扩展技术,难以在简单模拟环境之外实现泛化,或无法有效学习隐式物理推理。同时,富含物理交互现象的训练数据稀缺也是关键瓶颈。本文首次提出物理增强视频数据构建流程PhyAugPipe,通过融合视觉语言模型(VLM)的思维链推理,构建了大规模训练数据集PhyVidGen-135K。在此基础上,我们建立了基于物理感知的分组直接偏好优化框架PhyGDPO,该框架以分组Plackett-Luce概率模型为基础,可捕捉超越成对比较的整体偏好关系。在PhyGDPO中,我们设计了物理引导奖励(PGR)机制,通过嵌入基于VLM的物理奖励函数引导优化过程朝向物理一致性发展。同时提出LoRA切换参考(LoRA-SR)方案,通过消除内存密集的参考模型复制实现高效训练。实验表明,我们的方法在PhyGenBench和VideoPhy2基准上显著优于当前最优开源方法。更多视频结果请访问项目页面https://caiyuanhao1998.github.io/project/PhyGDPO。代码、模型及数据将在https://github.com/caiyuanhao1998/Open-PhyGDPO发布。
English
Recent advances in text-to-video (T2V) generation have achieved good visual quality, yet synthesizing videos that faithfully follow physical laws remains an open challenge. Existing methods mainly based on graphics or prompt extension struggle to generalize beyond simple simulated environments or learn implicit physical reasoning. The scarcity of training data with rich physics interactions and phenomena is also a problem. In this paper, we first introduce a Physics-Augmented video data construction Pipeline, PhyAugPipe, that leverages a vision-language model (VLM) with chain-of-thought reasoning to collect a large-scale training dataset, PhyVidGen-135K. Then we formulate a principled Physics-aware Groupwise Direct Preference Optimization, PhyGDPO, framework that builds upon the groupwise Plackett-Luce probabilistic model to capture holistic preferences beyond pairwise comparisons. In PhyGDPO, we design a Physics-Guided Rewarding (PGR) scheme that embeds VLM-based physics rewards to steer optimization toward physical consistency. We also propose a LoRA-Switch Reference (LoRA-SR) scheme that eliminates memory-heavy reference duplication for efficient training. Experiments show that our method significantly outperforms state-of-the-art open-source methods on PhyGenBench and VideoPhy2. Please check our project page at https://caiyuanhao1998.github.io/project/PhyGDPO for more video results. Our code, models, and data will be released at https://github.com/caiyuanhao1998/Open-PhyGDPO
PDF122January 2, 2026