ChatPaper.aiChatPaper

自我优化的视频采样

Self-Refining Video Sampling

January 26, 2026
作者: Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Saining Xie, Jaehong Yoon, Sung Ju Hwang
cs.AI

摘要

当前视频生成模型在处理复杂物理动态时仍面临挑战,常难以实现真实的物理效果。现有方法通常借助外部验证器或对增强数据进行额外训练来解决这一问题,但这些方案计算成本高昂,且在捕捉细粒度运动方面仍存在局限。本研究提出自优化视频采样方法,该简易技术利用在大规模数据集上预训练的视频生成器作为自身的优化器。通过将生成器解读为去噪自编码器,我们实现了无需外部验证器或额外训练的推理阶段迭代式内循环优化。我们进一步引入基于不确定性的优化策略,通过自一致性选择性地优化特定区域,从而避免过度优化导致的伪影问题。在顶尖视频生成模型上的实验表明,该方法在运动连贯性和物理一致性方面实现显著提升,相比默认采样器和基于引导的采样器,获得了超过70%的人类偏好度。
English
Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment, achieving over 70\% human preference compared to the default sampler and guidance-based sampler.
PDF152January 28, 2026