ChatPaper.aiChatPaper

PISCES:基于最优传输对齐奖励的无标注文本到视频后训练方法

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

February 2, 2026
作者: Minh-Quan Le, Gaurav Mittal, Cheng Zhao, David Gu, Dimitris Samaras, Mei Chen
cs.AI

摘要

文本到视频(T2V)生成技术旨在合成具有高视觉质量、时间连贯性且与输入文本语义对齐的视频。基于奖励的后训练方法已成为提升生成视频质量与语义对齐度的新兴方向。然而,现有方法或依赖大规模人工偏好标注,或采用预训练视觉-语言模型中未对齐的嵌入表示,导致可扩展性受限或监督效果欠佳。本文提出PISCES——一种无需标注的后训练算法,通过新颖的双重最优传输(OT)对齐奖励模块突破上述局限。为实现奖励信号与人类判断的对齐,PISCES运用OT技术在分布层面和离散标记层面分别构建文本-视频嵌入的关联,使奖励监督实现双重目标:(1)分布级OT对齐质量奖励,捕捉整体视觉质量与时间连贯性;(2)离散标记级OT对齐语义奖励,强化文本与视频标记间的语义时空对应关系。据我们所知,PISCES是首个通过OT视角改进生成式后训练中无标注奖励监督的方法。在长短视频生成任务上的实验表明,PISCES在VBench评估的质量与语义分数上均优于基于标注和无标注的方法,人类偏好研究进一步验证其有效性。我们还证明双重OT对齐奖励模块可兼容多种优化范式,包括直接反向传播和强化学习微调。
English
Text-to-video (T2V) generation aims to synthesize videos with high visual quality and temporal consistency that are semantically aligned with input text. Reward-based post-training has emerged as a promising direction to improve the quality and semantic alignment of generated videos. However, recent methods either rely on large-scale human preference annotations or operate on misaligned embeddings from pre-trained vision-language models, leading to limited scalability or suboptimal supervision. We present PISCES, an annotation-free post-training algorithm that addresses these limitations via a novel Dual Optimal Transport (OT)-aligned Rewards module. To align reward signals with human judgment, PISCES uses OT to bridge text and video embeddings at both distributional and discrete token levels, enabling reward supervision to fulfill two objectives: (i) a Distributional OT-aligned Quality Reward that captures overall visual quality and temporal coherence; and (ii) a Discrete Token-level OT-aligned Semantic Reward that enforces semantic, spatio-temporal correspondence between text and video tokens. To our knowledge, PISCES is the first to improve annotation-free reward supervision in generative post-training through the lens of OT. Experiments on both short- and long-video generation show that PISCES outperforms both annotation-based and annotation-free methods on VBench across Quality and Semantic scores, with human preference studies further validating its effectiveness. We show that the Dual OT-aligned Rewards module is compatible with multiple optimization paradigms, including direct backpropagation and reinforcement learning fine-tuning.
PDF232February 7, 2026