ChatPaper.aiChatPaper

PhyGDPO:物理感知群組直接偏好優化——實現物理一致性的文字轉影片生成

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

December 31, 2025
作者: Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Feng Liang, Weifeng Chen, Felix Juefei-Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, Ji Hou
cs.AI

摘要

近期文字轉影片(T2V)生成技術在視覺品質上取得顯著進展,但如何合成嚴格遵循物理定律的影片仍是待解難題。現有方法主要基於圖形學或提示詞擴展,難以在簡單模擬環境之外實現泛化,或無法有效學習隱式物理推理。此外,富含物理互動現象的訓練數據匱乏也是關鍵瓶頸。本文首先提出物理增強影片數據建構流程PhyAugPipe,通過視覺語言模型(VLM)的思維鏈推理構建大規模訓練數據集PhyVidGen-135K;進而建立具理論基礎的物理感知分組直接偏好優化框架PhyGDPO,基於分組Plackett-Luce概率模型捕捉超越成對比較的整體偏好關係。在PhyGDPO中,我們設計了物理引導獎勵(PGR)機制,嵌入基於VLM的物理獎勵以引導模型朝向物理一致性優化,同時提出LoRA切換參考(LoRA-SR)方案,消除記憶體密集的參考模型複製以實現高效訓練。實驗表明,本方法在PhyGenBench和VideoPhy2基準上顯著超越現有開源方案。更多影片結果請參閱項目頁面https://caiyuanhao1998.github.io/project/PhyGDPO,程式碼、模型與數據將開源於https://github.com/caiyuanhao1998/Open-PhyGDPO。
English
Recent advances in text-to-video (T2V) generation have achieved good visual quality, yet synthesizing videos that faithfully follow physical laws remains an open challenge. Existing methods mainly based on graphics or prompt extension struggle to generalize beyond simple simulated environments or learn implicit physical reasoning. The scarcity of training data with rich physics interactions and phenomena is also a problem. In this paper, we first introduce a Physics-Augmented video data construction Pipeline, PhyAugPipe, that leverages a vision-language model (VLM) with chain-of-thought reasoning to collect a large-scale training dataset, PhyVidGen-135K. Then we formulate a principled Physics-aware Groupwise Direct Preference Optimization, PhyGDPO, framework that builds upon the groupwise Plackett-Luce probabilistic model to capture holistic preferences beyond pairwise comparisons. In PhyGDPO, we design a Physics-Guided Rewarding (PGR) scheme that embeds VLM-based physics rewards to steer optimization toward physical consistency. We also propose a LoRA-Switch Reference (LoRA-SR) scheme that eliminates memory-heavy reference duplication for efficient training. Experiments show that our method significantly outperforms state-of-the-art open-source methods on PhyGenBench and VideoPhy2. Please check our project page at https://caiyuanhao1998.github.io/project/PhyGDPO for more video results. Our code, models, and data will be released at https://github.com/caiyuanhao1998/Open-PhyGDPO
PDF122January 2, 2026