ChatPaper.aiChatPaper

FoleyCrafter:透過逼真且同步的聲音為無聲影片賦予生命

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

July 1, 2024
作者: Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen
cs.AI

摘要

我們研究神經佛利,這是一種自動生成高質量音效並與視頻同步的技術,可實現身臨其境的聲視覺體驗。儘管具有廣泛的應用,現有方法在同時合成高質量和與視頻對齊(即語義相關且時間同步)的聲音方面存在限制。為了克服這些限制,我們提出了FoleyCrafter,這是一個新穎的框架,利用預訓練的文本轉音頻模型來確保高質量音頻生成。FoleyCrafter包括兩個關鍵組件:用於語義對齊的語義適配器和用於精確音視頻同步的時間控制器。語義適配器利用平行交叉注意力層來使音頻生成依賴於視頻特徵,產生與視覺內容語義相關的逼真音效。同時,時間控制器結合了起始檢測器和基於時間戳的適配器,實現精確的音視頻對齊。FoleyCrafter的一個顯著優勢是其與文本提示的兼容性,可以利用文本描述實現根據用戶意圖的可控和多樣化的視頻到音頻生成。我們在標準基準上進行了廣泛的定量和定性實驗,以驗證FoleyCrafter的有效性。模型和代碼可在https://github.com/open-mmlab/FoleyCrafter 上找到。
English
We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations, we propose FoleyCrafter, a novel framework that leverages a pre-trained text-to-audio model to ensure high-quality audio generation. FoleyCrafter comprises two key components: the semantic adapter for semantic alignment and the temporal controller for precise audio-video synchronization. The semantic adapter utilizes parallel cross-attention layers to condition audio generation on video features, producing realistic sound effects that are semantically relevant to the visual content. Meanwhile, the temporal controller incorporates an onset detector and a timestampbased adapter to achieve precise audio-video alignment. One notable advantage of FoleyCrafter is its compatibility with text prompts, enabling the use of text descriptions to achieve controllable and diverse video-to-audio generation according to user intents. We conduct extensive quantitative and qualitative experiments on standard benchmarks to verify the effectiveness of FoleyCrafter. Models and codes are available at https://github.com/open-mmlab/FoleyCrafter.

Summary

AI-Generated Summary

PDF152November 28, 2024