ChatPaper.aiChatPaper

ReSyncer:为统一的音频-视觉同步面部表演者重新连接基于风格的生成器

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

August 6, 2024
作者: Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu
cs.AI

摘要

利用给定音频制作唇同步视频是各种应用的基础,包括创建虚拟主持人或表演者。尽管最近的研究探索了使用不同技术实现高保真度的唇同步,但它们的面向任务的模型要么需要长期视频进行特定片段的训练,要么会保留可见的伪影。本文提出了一个统一且有效的框架 ReSyncer,用于同步通用的音频-视觉面部信息。关键设计是重新审视并重构基于样式的生成器,以有效地采用由基于原则的注入样式的 Transformer 预测的 3D 面部动态。通过简单地重新配置嵌入在噪声和样式空间中的信息插入机制,我们的框架将运动和外观融合在一起进行统一训练。大量实验证明,ReSyncer 不仅能根据音频生成高保真度的唇同步视频,还支持多种吸引人的特性,适用于创建虚拟主持人和表演者,包括快速个性化微调、视频驱动的唇同步、言谈风格的转移,甚至是人脸交换。资源可在 https://guanjz20.github.io/projects/ReSyncer 找到。
English
Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyncer, that synchronizes generalized audio-visual facial information. The key design is revisiting and rewiring the Style-based generator to efficiently adopt 3D facial dynamics predicted by a principled style-injected Transformer. By simply re-configuring the information insertion mechanisms within the noise and style space, our framework fuses motion and appearance with unified training. Extensive experiments demonstrate that ReSyncer not only produces high-fidelity lip-synced videos according to audio, but also supports multiple appealing properties that are suitable for creating virtual presenters and performers, including fast personalized fine-tuning, video-driven lip-syncing, the transfer of speaking styles, and even face swapping. Resources can be found at https://guanjz20.github.io/projects/ReSyncer.

Summary

AI-Generated Summary

PDF112November 28, 2024