KeySync:一种高分辨率下无泄漏唇形同步的鲁棒方法
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution
May 1, 2025
作者: Antoni Bigata, Rodrigo Mira, Stella Bounareli, Michał Stypułkowski, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
cs.AI
摘要
唇形同步,即调整现有视频中的唇部动作以匹配新输入音频的任务,通常被视为音频驱动面部动画的一种简化变体。然而,除了面临说话头像生成中的常见问题(如时间一致性)外,唇形同步还带来了显著的新挑战,例如输入视频中的表情泄露和面部遮挡,这些问题会严重影响自动配音等实际应用,但在现有工作中往往被忽视。为解决这些不足,我们提出了KeySync,一个两阶段框架,成功解决了时间一致性问题,同时通过精心设计的掩码策略,整合了针对泄露和遮挡的解决方案。我们展示了KeySync在唇部重建和跨同步方面达到了最先进的效果,根据我们新提出的泄露度量标准LipLeak,提升了视觉质量并减少了表情泄露。此外,我们验证了新掩码方法在处理遮挡方面的有效性,并通过多项消融研究证实了我们的架构选择。代码和模型权重可在https://antonibigata.github.io/KeySync获取。
English
Lip synchronization, known as the task of aligning lip movements in an
existing video with new input audio, is typically framed as a simpler variant
of audio-driven facial animation. However, as well as suffering from the usual
issues in talking head generation (e.g., temporal consistency), lip
synchronization presents significant new challenges such as expression leakage
from the input video and facial occlusions, which can severely impact
real-world applications like automated dubbing, but are often neglected in
existing works. To address these shortcomings, we present KeySync, a two-stage
framework that succeeds in solving the issue of temporal consistency, while
also incorporating solutions for leakage and occlusions using a carefully
designed masking strategy. We show that KeySync achieves state-of-the-art
results in lip reconstruction and cross-synchronization, improving visual
quality and reducing expression leakage according to LipLeak, our novel leakage
metric. Furthermore, we demonstrate the effectiveness of our new masking
approach in handling occlusions and validate our architectural choices through
several ablation studies. Code and model weights can be found at
https://antonibigata.github.io/KeySync.Summary
AI-Generated Summary