自我優化影片取樣技術
Self-Refining Video Sampling
January 26, 2026
作者: Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Saining Xie, Jaehong Yoon, Sung Ju Hwang
cs.AI
摘要
當代影片生成模型在處理複雜物理動力學時仍面臨挑戰,往往難以實現真實的物理模擬效果。現有方法通常借助外部驗證器或對增強數據進行額外訓練來改善此問題,但這種方式計算成本高昂,且在捕捉細粒度運動方面仍有侷限。本研究提出自優化影片採樣法,這是一種簡潔的技術,利用預訓練於大規模數據集的影片生成器作為自身的優化器。通過將生成器解讀為去噪自編碼器,我們在推理階段實現無需外部驗證器或附加訓練的迭代式內循環優化。我們進一步引入基於自一致性的不確定性感知優化策略,該策略會根據區域特徵選擇性進行局部優化,從而避免過度優化導致的失真現象。在頂尖影片生成模型上的實驗表明,該方法能顯著提升運動連貫性與物理規律契合度,相較於預設採樣器與基於引導的採樣器,獲得了超過70%的人工評測偏好。
English
Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment, achieving over 70\% human preference compared to the default sampler and guidance-based sampler.