合法使用合成说唱视频的头像指纹识别
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
May 5, 2023
作者: Ekta Prashnani, Koki Nagano, Shalini De Mello, David Luebke, Orazio Gallo
cs.AI
摘要
现代生成器以令人印象深刻的逼真程度呈现说话头像视频,引入了新的用户体验,如在受限带宽预算下进行视频会议。然而,它们的安全采用需要一种机制来验证生成的视频是否可信。例如,对于视频会议,我们必须识别合成视频肖像未经个人同意使用外观的情况。我们将这项任务称为“头像指纹识别”。我们建议通过利用每个人独特的面部运动特征来解决这个问题。具体来说,我们学习一个嵌入式空间,其中一个身份的运动特征被分组在一起,并且与其他身份的运动特征相距较远,而不考虑合成视频中的外观。随着说话头像生成器变得更加普遍,头像指纹识别算法将变得至关重要,但目前尚无大规模数据集用于这一新任务。因此,我们提供了一个包含人们进行编写和即兴短独白的大型数据集,同时伴随着合成视频,其中我们渲染一个人的视频,但使用另一个人的面部外观。项目页面:https://research.nvidia.com/labs/nxp/avatar-fingerprinting/。
English
Modern generators render talking-head videos with impressive levels of
photorealism, ushering in new user experiences such as videoconferencing under
constrained bandwidth budgets. Their safe adoption, however, requires a
mechanism to verify if the rendered video is trustworthy. For instance, for
videoconferencing we must identify cases in which a synthetic video portrait
uses the appearance of an individual without their consent. We term this task
avatar fingerprinting. We propose to tackle it by leveraging facial motion
signatures unique to each person. Specifically, we learn an embedding in which
the motion signatures of one identity are grouped together, and pushed away
from those of other identities, regardless of the appearance in the synthetic
video. Avatar fingerprinting algorithms will be critical as talking head
generators become more ubiquitous, and yet no large scale datasets exist for
this new task. Therefore, we contribute a large dataset of people delivering
scripted and improvised short monologues, accompanied by synthetic videos in
which we render videos of one person using the facial appearance of another.
Project page: https://research.nvidia.com/labs/nxp/avatar-fingerprinting/.