ChatPaper.aiChatPaper

KeySync:一種在高解析度下實現無洩漏唇形同步的穩健方法

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

May 1, 2025
作者: Antoni Bigata, Rodrigo Mira, Stella Bounareli, Michał Stypułkowski, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
cs.AI

摘要

唇形同步,即對現有視頻中的唇部動作與新輸入音頻進行對齊的任務,通常被視為音頻驅動面部動畫的一種簡化變體。然而,除了面臨說話頭生成中的常見問題(如時間一致性)外,唇形同步還帶來了顯著的新挑戰,例如輸入視頻中的表情洩露和面部遮擋,這些問題會嚴重影響自動配音等實際應用,但在現有工作中往往被忽視。為解決這些不足,我們提出了KeySync,這是一個兩階段框架,成功解決了時間一致性的問題,同時通過精心設計的遮罩策略,整合了解決洩露和遮擋的方案。我們展示了KeySync在唇部重建和跨同步方面達到了最先進的成果,根據我們新提出的洩露度量標準LipLeak,提升了視覺質量並減少了表情洩露。此外,我們證明了新遮罩方法在處理遮擋方面的有效性,並通過多項消融研究驗證了我們的架構選擇。代碼和模型權重可在https://antonibigata.github.io/KeySync找到。
English
Lip synchronization, known as the task of aligning lip movements in an existing video with new input audio, is typically framed as a simpler variant of audio-driven facial animation. However, as well as suffering from the usual issues in talking head generation (e.g., temporal consistency), lip synchronization presents significant new challenges such as expression leakage from the input video and facial occlusions, which can severely impact real-world applications like automated dubbing, but are often neglected in existing works. To address these shortcomings, we present KeySync, a two-stage framework that succeeds in solving the issue of temporal consistency, while also incorporating solutions for leakage and occlusions using a carefully designed masking strategy. We show that KeySync achieves state-of-the-art results in lip reconstruction and cross-synchronization, improving visual quality and reducing expression leakage according to LipLeak, our novel leakage metric. Furthermore, we demonstrate the effectiveness of our new masking approach in handling occlusions and validate our architectural choices through several ablation studies. Code and model weights can be found at https://antonibigata.github.io/KeySync.

Summary

AI-Generated Summary

PDF115May 4, 2025