ChatPaper.aiChatPaper

视频生成模型是优秀的潜在奖励模型

Video Generation Models Are Good Latent Reward Models

November 26, 2025
作者: Xiaoyue Mi, Wenqing Yu, Jiesong Lian, Shibo Jie, Ruizhe Zhong, Zijun Liu, Guozhen Zhang, Zixiang Zhou, Zhiyong Xu, Yuan Zhou, Qinglin Lu, Fan Tang
cs.AI

摘要

奖励反馈学习(ReFL)已被证明能有效对齐图像生成与人类偏好,但其在视频生成领域的扩展面临重大挑战。现有视频奖励模型依赖为像素空间输入设计的视觉语言模型,这导致ReFL优化被限制在计算成本高昂的VAE解码后接近完成的去噪阶段。这种像素空间方法不仅带来巨大的内存开销和训练时间延长,其后期优化缺乏早期监督机制,仅能改善视觉质量而无法优化基础运动动态与结构连贯性。本研究证明,预训练视频生成模型天然适用于噪声潜在空间的奖励建模,因为它们专为处理任意时间步的噪声潜在表示而设计,且通过序列建模能力固有地保留时序信息。基于此,我们提出过程奖励反馈学习(PRFL),该框架完全在潜在空间中进行偏好优化,无需VAE解码即可实现全去噪链的高效梯度反向传播。大量实验表明,PRFL在显著提升人类偏好对齐度的同时,相较RGB ReFL实现了内存消耗与训练时间的大幅降低。
English
Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach incurs substantial memory overhead and increased training time, and its late-stage optimization lacks early-stage supervision, refining only visual quality rather than fundamental motion dynamics and structural coherence. In this work, we show that pre-trained video generation models are naturally suited for reward modeling in the noisy latent space, as they are explicitly designed to process noisy latent representations at arbitrary timesteps and inherently preserve temporal information through their sequential modeling capabilities. Accordingly, we propose Process Reward Feedback Learning~(PRFL), a framework that conducts preference optimization entirely in latent space, enabling efficient gradient backpropagation throughout the full denoising chain without VAE decoding. Extensive experiments demonstrate that PRFL significantly improves alignment with human preferences, while achieving substantial reductions in memory consumption and training time compared to RGB ReFL.
PDF344December 1, 2025