ChatPaper.aiChatPaper

视频生成中的引力处理:通过可验证奖励实现训练后牛顿定律嵌入 针对视频生成中的物理规律建模,提出一种训练后优化框架。该方案通过引入可验证的奖励机制,将牛顿运动定律作为物理约束嵌入生成模型,确保生成的视频序列符合引力作用下的运动规律。具体而言,我们设计了基于物理方程的验证模块,通过奖励信号引导模型在保持原始生成能力的同时,实现符合牛顿力学的动态模拟。实验表明,该方法能有效提升生成视频的物理合理性,且无需重新训练基础模型。

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

November 29, 2025
作者: Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, Dimitris Samaras
cs.AI

摘要

当前视频扩散模型虽能生成视觉上引人入胜的片段,却常违背基本物理定律——物体漂浮、加速度漂移、碰撞行为不一致——揭示了视觉真实感与物理真实感之间的持续差距。我们提出NewtonRewards,首个基于可验证奖励的物理基础后训练视频生成框架。该框架不依赖人类或视觉语言模型的反馈,而是通过冻结的效用模型从生成视频中提取可测量代理指标:光流作为速度的代理,高级外观特征作为质量的代理。这些代理指标通过两种互补的奖励机制实现牛顿力学结构的显式强化:牛顿运动学约束确保恒定加速度动力学,质量守恒奖励防止出现平庸的退化解。我们在新构建的大规模基准数据集NewtonBench-60K上,针对五种牛顿运动基本模式(自由落体、水平/抛物线抛射、斜面下滑/上滑)进行评估。在所有运动模式的视觉与物理指标中,NewtonRewards均能持续提升物理合理性、运动平滑度与时间连贯性,优于现有后训练方法。该框架在高度、速度、摩擦力的分布外偏移下仍保持强劲性能。我们的研究表明,基于物理的可验证奖励为物理感知视频生成提供了可扩展的路径。
English
Recent video diffusion models can synthesize visually compelling clips, yet often violate basic physical laws-objects float, accelerations drift, and collisions behave inconsistently-revealing a persistent gap between visual realism and physical realism. We propose NewtonRewards, the first physics-grounded post-training framework for video generation based on verifiable rewards. Instead of relying on human or VLM feedback, NewtonRewards extracts measurable proxies from generated videos using frozen utility models: optical flow serves as a proxy for velocity, while high-level appearance features serve as a proxy for mass. These proxies enable explicit enforcement of Newtonian structure through two complementary rewards: a Newtonian kinematic constraint enforcing constant-acceleration dynamics, and a mass conservation reward preventing trivial, degenerate solutions. We evaluate NewtonRewards on five Newtonian Motion Primitives (free fall, horizontal/parabolic throw, and ramp sliding down/up) using our newly constructed large-scale benchmark, NewtonBench-60K. Across all primitives in visual and physics metrics, NewtonRewards consistently improves physical plausibility, motion smoothness, and temporal coherence over prior post-training methods. It further maintains strong performance under out-of-distribution shifts in height, speed, and friction. Our results show that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation.
PDF391December 3, 2025