AAD-1：一步式自回歸視頻生成的非對稱對抗性蒸餾

摘要

我們提出AAD-1，這是一種用於一步自回歸圖像到影片生成的非對稱對抗蒸餾框架。當前最先進的方法採用對抗蒸餾，但會遭遇運動崩潰與訓練不穩定的問題，導致生成靜態影片。AAD-1透過架構與訓練策略上的兩項關鍵設計來解決這些挑戰。我們的架構核心見解在於打破生成器與判別器之間的對稱性：生成器保持因果性以保留自回歸取樣能力，而判別器則雙向關注完整的時空上下文，並為整個影片序列產出單一的整體真實性評分。這種非對稱設計使判別器能有效偵測導致自回歸生成中運動崩潰的全域時間失敗與長程漂移。為穩定訓練，我們引入分階段策略，首先使用分布匹配引導出穩定的單步生成器，提供一個預熱階段，在對抗蒸餾開始前將學生分布拉近教師分布。在VBench上的大量實驗證明，AAD-1在一步自回歸影片生成中達到最先進的效能。

English

We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses these challenges through two key designs in architecture and training strategy. Our key architectural insight is to break the symmetry between generator and discriminator. While the generator remains causal to preserve autoregressive sampling capability, the discriminator attends bidirectionally over the full spatiotemporal context and produces a single holistic realism score for the entire video sequence. This asymmetric design enables the discriminator to effectively detect global temporal failures and long-range drift that cause motion collapse in autoregressive generation. To stabilize training, we introduce a phased strategy that first uses distribution matching to bootstrap a stable one-step generator, providing a warm-up phase that brings the student distribution closer to the teacher before adversarial distillation begins. Extensive experiments on VBench demonstrate that AAD-1 achieves state-of-the-art performance in one-step autoregressive video generation.