ChatPaper.aiChatPaper

在影片生成中如何處理重力問題?透過可驗證獎勵進行牛頓定律的後訓練優化

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

November 29, 2025
作者: Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, Dimitris Samaras
cs.AI

摘要

近期影片擴散模型雖能合成視覺效果出眾的片段,卻常違背基本物理定律——物體漂浮、加速度漂移、碰撞行為不一致——顯露出視覺真實性與物理真實性之間的持續差距。我們提出NewtonRewards,首個基於可驗證獎勵的物理基礎影片生成後訓練框架。該框架無需依賴人類或視覺語言模型反饋,而是透過凍結的實用模型從生成影片中提取可測量代理指標:光流作為速度代理,高層次外觀特徵作為質量代理。這些代理指標透過兩種互補獎勵機制實現牛頓力學結構的顯式強化:牛頓運動學約束強制保持恆加速度動力學,質量守恆獎勵則防止出現平凡退化解。我們使用新建構的大規模基準NewtonBench-60K,針對五種牛頓運動基元(自由落體、水平/拋物線拋射、斜坡下滑/上滑)進行評估。在所有運動基元的視覺與物理指標中,NewtonRewards相較既有後訓練方法持續提升物理合理性、運動平滑度與時間連貫性,並在高度、速度、摩擦力的分佈外偏移條件下仍保持強健性能。實驗結果表明,基於物理的可驗證獎勵為實現物理感知的影片生成提供了可擴展路徑。
English
Recent video diffusion models can synthesize visually compelling clips, yet often violate basic physical laws-objects float, accelerations drift, and collisions behave inconsistently-revealing a persistent gap between visual realism and physical realism. We propose NewtonRewards, the first physics-grounded post-training framework for video generation based on verifiable rewards. Instead of relying on human or VLM feedback, NewtonRewards extracts measurable proxies from generated videos using frozen utility models: optical flow serves as a proxy for velocity, while high-level appearance features serve as a proxy for mass. These proxies enable explicit enforcement of Newtonian structure through two complementary rewards: a Newtonian kinematic constraint enforcing constant-acceleration dynamics, and a mass conservation reward preventing trivial, degenerate solutions. We evaluate NewtonRewards on five Newtonian Motion Primitives (free fall, horizontal/parabolic throw, and ramp sliding down/up) using our newly constructed large-scale benchmark, NewtonBench-60K. Across all primitives in visual and physics metrics, NewtonRewards consistently improves physical plausibility, motion smoothness, and temporal coherence over prior post-training methods. It further maintains strong performance under out-of-distribution shifts in height, speed, and friction. Our results show that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation.
PDF391December 3, 2025