RAVEN：基於一致性模型GRPO的即時自迴歸影片外推

摘要

因果自回归视频扩散模型通過從先前生成的內容外推未來片段，支援即時串流生成。從高保真雙向教師模型中蒸餾此類生成器，可得到具競爭力的少步模型，但訓練過程中遇到的歷史分佈與推理時產生的分佈之間存在持續差距，限制了長時段內的生成品質。我們提出即時自回歸視頻外推網絡（RAVEN），這是一種訓練時測試框架，將每次自我展開重組為由乾淨歷史端點與帶噪去噪狀態交錯組成的序列。此表述使訓練注意力與推論時的外推過程對齊，並允許後續片段損失監督未來預測所依賴的歷史表徵。我們進一步提出一致性模型群體相對策略優化（CM-GRPO），將一致性取樣步驟重新表述為條件高斯轉移，並直接對此核應用線上強化學習（RL），避免了先前流模型RL公式中採用的歐拉-丸山輔助過程。實驗表明，RAVEN在品質、語義及動態程度評估上超越近期因果視頻蒸餾基線，且結合RAVEN使用時，CM-GRPO能進一步提升效能。

English

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.