RAVEN：整合一致性モデルGRPOのリアルタイム自己回帰型ビデオ外挿

要旨

因果性自己回帰ビデオ拡散モデルは、以前に生成されたコンテンツから将来のチャンクを外挿することでリアルタイムストリーミング生成を実現する。高忠実度の双方向ティーチャーからこのような生成器を蒸留することで、競争力のある少数ステップモデルが得られるが、訓練時に遭遇する履歴分布と推論時に生じる分布との間の永続的なギャップが、長期的な生成品質を制約する。我々は、Real-time Autoregressive Video Extrapolation Network (RAVEN)を導入する。これは訓練時のテストフレームワークであり、各自己ロールアウトをクリーンな履歴エンドポイントとノイズのあるデノイジング状態のインターリーブシーケンスに再パッケージする。この定式化により、訓練時の注意機構を推論時の外挿と整合させ、将来の予測が依存する履歴表現を下流のチャンクロスが監視できるようになる。さらに、Consistency-model Group Relative Policy Optimization (CM-GRPO)を提案する。これは一貫性サンプリングステップを条件付きガウス遷移として再定式化し、オンライン強化学習(RL)をこのカーネルに直接適用することで、先行するフローモデルRL定式化で採用されているEuler-Maruyama補助プロセスを回避する。実験により、RAVENが品質、意味、動的度合いの評価において最近の因果的ビデオ蒸留ベースラインを上回り、CM-GRPOをRAVENと組み合わせることでさらなる向上が得られることが示される。

English

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.