FoleyCrafter:通过栩栩如生和同步的声音将无声视频变得生动。
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
July 1, 2024
作者: Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen
cs.AI
摘要
我们研究神经弗利,即自动生成与视频同步的高质量音效,实现沉浸式视听体验。尽管具有广泛的应用,现有方法在同时合成高质量和与视频对齐(即语义相关且时间同步)的声音时存在局限性。为了克服这些限制,我们提出了FoleyCrafter,这是一个新颖的框架,利用预训练的文本到音频模型确保高质量音频生成。FoleyCrafter包括两个关键组件:用于语义对齐的语义适配器和用于精确音视频同步的时间控制器。语义适配器利用并行交叉注意力层将音频生成条件化为视频特征,产生与视觉内容语义相关的逼真音效。同时,时间控制器结合了起始检测器和基于时间戳的适配器,实现精确的音视频对齐。FoleyCrafter的一个显着优势是其与文本提示的兼容性,可以利用文本描述根据用户意图实现可控和多样化的视频到音频生成。我们在标准基准上进行了广泛的定量和定性实验,以验证FoleyCrafter的有效性。模型和代码可在https://github.com/open-mmlab/FoleyCrafter找到。
English
We study Neural Foley, the automatic generation of high-quality sound effects
synchronizing with videos, enabling an immersive audio-visual experience.
Despite its wide range of applications, existing approaches encounter
limitations when it comes to simultaneously synthesizing high-quality and
video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To
overcome these limitations, we propose FoleyCrafter, a novel framework that
leverages a pre-trained text-to-audio model to ensure high-quality audio
generation. FoleyCrafter comprises two key components: the semantic adapter for
semantic alignment and the temporal controller for precise audio-video
synchronization. The semantic adapter utilizes parallel cross-attention layers
to condition audio generation on video features, producing realistic sound
effects that are semantically relevant to the visual content. Meanwhile, the
temporal controller incorporates an onset detector and a timestampbased adapter
to achieve precise audio-video alignment. One notable advantage of FoleyCrafter
is its compatibility with text prompts, enabling the use of text descriptions
to achieve controllable and diverse video-to-audio generation according to user
intents. We conduct extensive quantitative and qualitative experiments on
standard benchmarks to verify the effectiveness of FoleyCrafter. Models and
codes are available at https://github.com/open-mmlab/FoleyCrafter.Summary
AI-Generated Summary