ChatPaper.aiChatPaper

单步强制:迈向稳定的单步自回归视频生成

One-Forcing: Towards Stable One-Step Autoregressive Video Generation

May 22, 2026
作者: Jiaqi Feng, Justin Cui, Yuanhao Ban, Cho-Jui Hsieh
cs.AI

摘要

近期研究在自回归范式下显著提升实时交互视频生成性能。然而,现有的大多数少步自回归视频生成方法(通常从对应的多步教师模型中蒸馏得到)默认采用4步采样配置,这在实际部署中仍存在较大延迟,且当采样步数进一步减少(特别是在单步设置下)时,会遭遇严重的质量退化问题。轨迹式一致性蒸馏方法生成的视频往往动态性较弱,而基于DMD的方法(如Self-Forcing)则倾向于产生模糊帧。为应对这一挑战,我们提出One-Forcing方法——一种简单而有效的方案,通过向DMD目标函数中引入辅助的GAN损失,实现高质量且高效的单步视频生成。在VBench上的实验表明,One-Forcing的总得分为83.76,在单步因果视频生成方法中达到最先进水平,并与强大的多步方法保持竞争力。我们进一步证明,仅需分块模型三分之一训练成本,即可稳定实现单步逐帧自回归生成,而此前方法尚未在此设置下成功实现。
English
Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distilled from a corresponding many-step teacher, default to a 4-step sampling configuration, which still incurs considerable latency during deployment and suffers from severe quality degradation when the number of sampling steps is further reduced, particularly in the one-step setting. Trajectory-style consistency distillation methods often produce videos with weak dynamics, while DMD-based approaches, such as Self-Forcing, tend to yield blurry frames. To address this challenge, we propose One-Forcing, a simple yet effective approach which augments the DMD objective with an auxiliary GAN loss for high-quality and efficient one-step video generation. Experiments on VBench show that One-Forcing achieves a total score of 83.76, establishing state-of-the-art performance among one-step causal video generation methods and remaining competitive with strong many-step approaches. We further demonstrate that one-step framewise autoregressive generation can be achieved stably with merely one-third of the training cost of the chunkwise model, a setting that prior methods have failed to achieve successfully.