ChatPaper.aiChatPaper

模型已知最优噪声:通过注意力机制实现视频扩散模型中的贝叶斯主动噪声选择

Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model

May 23, 2025
作者: Kwanyoung Kim, Sanghyun Kim
cs.AI

摘要

初始噪声的选择对视频扩散模型的质量和提示对齐具有显著影响,同一提示下不同的噪声种子可能导致截然不同的生成结果。尽管现有方法依赖于外部设计的先验,如频率滤波器或帧间平滑处理,但它们往往忽视了模型内部信号,这些信号能够指示哪些噪声种子本质上更为优越。为此,我们提出了ANSE(主动噪声选择生成框架),这是一个模型感知的框架,通过量化基于注意力的不确定性来筛选高质量噪声种子。其核心是BANSA(基于贝叶斯的主动噪声选择通过注意力),一种获取函数,它通过测量多个随机注意力样本间的熵分歧来估计模型的置信度和一致性。为了在推理时高效部署,我们引入了BANSA的伯努利掩码近似方法,使得仅需一次扩散步骤和部分注意力层即可完成分数估计。在CogVideoX-2B和5B上的实验表明,ANSE分别仅增加了8%和13%的推理时间,就显著提升了视频质量和时间连贯性,为视频扩散中的噪声选择提供了一种原则性强且可推广的方法。访问我们的项目页面:https://anse-project.github.io/anse-project/
English
The choice of initial noise significantly affects the quality and prompt alignment of video diffusion models, where different noise seeds for the same prompt can lead to drastically different generations. While recent methods rely on externally designed priors such as frequency filters or inter-frame smoothing, they often overlook internal model signals that indicate which noise seeds are inherently preferable. To address this, we propose ANSE (Active Noise Selection for Generation), a model-aware framework that selects high-quality noise seeds by quantifying attention-based uncertainty. At its core is BANSA (Bayesian Active Noise Selection via Attention), an acquisition function that measures entropy disagreement across multiple stochastic attention samples to estimate model confidence and consistency. For efficient inference-time deployment, we introduce a Bernoulli-masked approximation of BANSA that enables score estimation using a single diffusion step and a subset of attention layers. Experiments on CogVideoX-2B and 5B demonstrate that ANSE improves video quality and temporal coherence with only an 8% and 13% increase in inference time, respectively, providing a principled and generalizable approach to noise selection in video diffusion. See our project page: https://anse-project.github.io/anse-project/

Summary

AI-Generated Summary

PDF293May 26, 2025