ChatPaper.aiChatPaper

UniTalker:通过统一模型扩展基于音频驱动的3D面部动画

UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

August 1, 2024
作者: Xiangyu Fan, Jiaqi Li, Zhiqian Lin, Weiye Xiao, Lei Yang
cs.AI

摘要

音频驱动的3D面部动画旨在将输入音频映射到逼真的面部运动。尽管取得了显著进展,但由于不一致的3D标注,以往的模型受到限制,只能在特定标注上进行训练,从而限制了训练规模。在这项工作中,我们提出了UniTalker,这是一个统一的模型,具有多头架构,旨在有效利用具有不同标注的数据集。为了增强训练稳定性并确保多头输出之间的一致性,我们采用了三种训练策略,即PCA、模型预热和枢轴身份嵌入。为了扩大训练规模和多样性,我们组建了A2F-Bench,包括五个公开数据集和三个新筛选的数据集。这些数据集涵盖了广泛的音频领域,涵盖了多语言演讲声音和歌曲,从而将训练数据从通常少于1小时的常用数据集扩展到18.5小时。通过单个经过训练的UniTalker模型,我们实现了BIWI数据集的唇部顶点误差降低了9.2%,Vocaset降低了13.7%。此外,预训练的UniTalker表现出作为音频驱动面部动画任务基础模型的潜力。在已见数据集上微调预训练的UniTalker进一步提高了每个数据集的性能,在A2F-Bench上平均误差降低了6.3%。此外,在仅具有一半数据的未见数据集上微调UniTalker超过了以往在完整数据集上训练的最先进模型。代码和数据集可在项目页面https://github.com/X-niper/UniTalker 上找到。
English
Audio-driven 3D facial animation aims to map input audio to realistic facial motion. Despite significant progress, limitations arise from inconsistent 3D annotations, restricting previous models to training on specific annotations and thereby constraining the training scale. In this work, we present UniTalker, a unified model featuring a multi-head architecture designed to effectively leverage datasets with varied annotations. To enhance training stability and ensure consistency among multi-head outputs, we employ three training strategies, namely, PCA, model warm-up, and pivot identity embedding. To expand the training scale and diversity, we assemble A2F-Bench, comprising five publicly available datasets and three newly curated datasets. These datasets contain a wide range of audio domains, covering multilingual speech voices and songs, thereby scaling the training data from commonly employed datasets, typically less than 1 hour, to 18.5 hours. With a single trained UniTalker model, we achieve substantial lip vertex error reductions of 9.2% for BIWI dataset and 13.7% for Vocaset. Additionally, the pre-trained UniTalker exhibits promise as the foundation model for audio-driven facial animation tasks. Fine-tuning the pre-trained UniTalker on seen datasets further enhances performance on each dataset, with an average error reduction of 6.3% on A2F-Bench. Moreover, fine-tuning UniTalker on an unseen dataset with only half the data surpasses prior state-of-the-art models trained on the full dataset. The code and dataset are available at the project page https://github.com/X-niper/UniTalker.

Summary

AI-Generated Summary

PDF112November 28, 2024