自回归长视频生成的路径式测试时校正
Pathwise Test-Time Correction for Autoregressive Long Video Generation
February 5, 2026
作者: Xunzhi Xiang, Zixuan Duan, Guiyu Zhang, Haiyu Zhang, Zhe Gao, Junta Wu, Shaofeng Zhang, Tengfei Wang, Qi Fan, Chunchao Guo
cs.AI
摘要
蒸馏自回归扩散模型虽能实现实时短视频合成,但在生成长序列时存在严重的误差累积问题。现有测试时优化方法虽对图像或短片段有效,但我们发现由于奖励景观的不稳定性及蒸馏参数的超敏感性,这些方法难以缓解长序列生成中的漂移现象。为此,我们提出无需训练的新型替代方案——测试时校正。该方法以首帧作为稳定参考锚点,通过校准采样轨迹中的中间随机状态实现修正。大量实验表明,本方案可无缝适配多种蒸馏模型,以可忽略的开销显著延长生成序列长度,在30秒生成基准上达到与资源密集型训练方法相当的质量水平。
English
Distilled autoregressive diffusion models facilitate real-time short video synthesis but suffer from severe error accumulation during long-sequence generation. While existing Test-Time Optimization (TTO) methods prove effective for images or short clips, we identify that they fail to mitigate drift in extended sequences due to unstable reward landscapes and the hypersensitivity of distilled parameters. To overcome these limitations, we introduce Test-Time Correction (TTC), a training-free alternative. Specifically, TTC utilizes the initial frame as a stable reference anchor to calibrate intermediate stochastic states along the sampling trajectory. Extensive experiments demonstrate that our method seamlessly integrates with various distilled models, extending generation lengths with negligible overhead while matching the quality of resource-intensive training-based methods on 30-second benchmarks.