合法使用合成說話頭像影片的頭像指紋識別

摘要

現代生成器以令人印象深刻的逼真程度呈現說話頭像影片，開創了新的使用者體驗，例如在受限頻寬預算下進行視訊會議。然而，要安全地採用這些生成器，需要一個機制來驗證渲染的影片是否可信。例如，在視訊會議中，我們必須識別合成的影片肖像在未經個人同意的情況下使用某人的外觀。我們將這個任務稱為頭像指紋識別。我們提議通過利用每個人獨特的面部運動簽名來應對這個問題。具體來說，我們學習一種嵌入式表示法，其中一個身份的運動簽名被聚集在一起，並且與其他身份的運動簽名保持距離，而不管在合成影片中的外觀如何。隨著說話頭像生成器變得更加普及，頭像指紋識別算法將至關重要，但目前尚無大規模數據集可供進行這項新任務。因此，我們提供了一個包含人們進行劇本和即興短篇獨白的大型數據集，並附帶合成影片，其中我們渲染了一個人使用另一個人的面部外觀的影片。專案頁面：https://research.nvidia.com/labs/nxp/avatar-fingerprinting/。

English

Modern generators render talking-head videos with impressive levels of photorealism, ushering in new user experiences such as videoconferencing under constrained bandwidth budgets. Their safe adoption, however, requires a mechanism to verify if the rendered video is trustworthy. For instance, for videoconferencing we must identify cases in which a synthetic video portrait uses the appearance of an individual without their consent. We term this task avatar fingerprinting. We propose to tackle it by leveraging facial motion signatures unique to each person. Specifically, we learn an embedding in which the motion signatures of one identity are grouped together, and pushed away from those of other identities, regardless of the appearance in the synthetic video. Avatar fingerprinting algorithms will be critical as talking head generators become more ubiquitous, and yet no large scale datasets exist for this new task. Therefore, we contribute a large dataset of people delivering scripted and improvised short monologues, accompanied by synthetic videos in which we render videos of one person using the facial appearance of another. Project page: https://research.nvidia.com/labs/nxp/avatar-fingerprinting/.

合法使用合成說話頭像影片的頭像指紋識別

Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos

摘要

Support