ChatPaper.aiChatPaper

PhysMaster:通过强化学习掌握视频生成中的物理表征

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

October 15, 2025
作者: Sihui Ji, Xi Chen, Xin Tao, Pengfei Wan, Hengshuang Zhao
cs.AI

摘要

现今的视频生成模型虽能生成视觉上逼真的视频,却常难以遵循物理定律,这限制了其生成物理上合理视频的能力,并阻碍了其作为“世界模型”的潜力。为解决此问题,我们提出了PhysMaster,它通过捕捉物理知识作为指导视频生成模型的表征,以增强其物理感知能力。具体而言,PhysMaster基于图像到视频的任务,模型需从输入图像中预测出物理上合理的动态变化。鉴于输入图像提供了场景中物体的相对位置及潜在交互等物理先验信息,我们设计了PhysEncoder,用于从中编码物理信息作为额外条件,将物理知识注入视频生成过程。由于模型在物理表现上缺乏超越外观的适当监督,PhysEncoder采用强化学习结合人类反馈来进行物理表征学习,利用生成模型的反馈,通过直接偏好优化(DPO)以端到端方式优化物理表征。PhysMaster为提升PhysEncoder乃至视频生成的物理感知能力提供了可行方案,其在一个简单代理任务上的表现证明了其能力,并展现了在广泛物理场景中的通用性。这表明,我们的PhysMaster通过强化学习范式下的表征学习统一了多种物理过程的解决方案,可作为物理感知视频生成及更广泛应用领域中的通用且即插即用的解决方案。
English
Video generation models nowadays are capable of generating visually realistic videos, but often fail to adhere to physical laws, limiting their ability to generate physically plausible videos and serve as ''world models''. To address this issue, we propose PhysMaster, which captures physical knowledge as a representation for guiding video generation models to enhance their physics-awareness. Specifically, PhysMaster is based on the image-to-video task where the model is expected to predict physically plausible dynamics from the input image. Since the input image provides physical priors like relative positions and potential interactions of objects in the scenario, we devise PhysEncoder to encode physical information from it as an extra condition to inject physical knowledge into the video generation process. The lack of proper supervision on the model's physical performance beyond mere appearance motivates PhysEncoder to apply reinforcement learning with human feedback to physical representation learning, which leverages feedback from generation models to optimize physical representations with Direct Preference Optimization (DPO) in an end-to-end manner. PhysMaster provides a feasible solution for improving physics-awareness of PhysEncoder and thus of video generation, proving its ability on a simple proxy task and generalizability to wide-ranging physical scenarios. This implies that our PhysMaster, which unifies solutions for various physical processes via representation learning in the reinforcement learning paradigm, can act as a generic and plug-in solution for physics-aware video generation and broader applications.
PDF362October 16, 2025