RAVEN：基于一致性模型GRPO的实时自回归视频外推

摘要

因果自回归视频扩散模型通过从已生成内容中外推未来片段，支持实时流式生成。从高保真双向教师模型中蒸馏此类生成器可获得竞争力强的少步模型，但训练与推理时遇到的历史分布之间存在持续差异，制约了长程生成质量。我们提出实时自回归视频外推网络（RAVEN），这是一种训练时测试框架，将每个自展开序列重新打包成交织的干净历史端点与含噪去噪状态序列。这种形式使训练注意力与推理时外推对齐，并允许下游块损失监督未来预测所依赖的历史表征。我们进一步提出一致性模型组相对策略优化（CM-GRPO），将一致性采样步骤重构为条件高斯转移，并直接对此核应用在线强化学习，避免了先前流模型强化学习公式中采用的欧拉-丸山辅助过程。实验表明，RAVEN在质量、语义及动态程度评估上均超越近期因果视频蒸馏基线，且结合RAVEN使用时，CM-GRPO能带来进一步提升。

English

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.