FaithfulFaces：用於文字轉影片生成的姿態忠實臉部身份保留

摘要

身份保持的文字到視頻生成（IPT2V）技術讓用戶能夠在保持人臉身份一致性的前提下，生成多樣且富有創意的視頻。儘管近期有所進展，現有方法在處理大範圍面部姿態變化或面部遮擋時，仍經常出現嚴重的身份失真問題。本文提出FaithfulFaces，一種基於姿態忠實的面部身份保持學習框架，旨在提升複雜動態場景下的IPT2V性能。FaithfulFaces的核心在於一個姿態共享的身份對齊器，該對齊器透過姿態共享字典以及姿態變化-身份不變性約束，對不同視角下的面部姿態進行精煉與對齊。通過將單視角輸入映射為帶有顯式歐拉角嵌入的全局面部姿態表徵，FaithfulFaces提供了姿態忠實的面部先驗，引導生成基礎模型實現穩健的身份保持生成。特別地，我們開發了一套專門的流程，以構建一個包含豐富面部姿態多樣性的高品質視頻數據集。大量實驗表明，FaithfulFaces達到了最先進的性能，即使在姿態變化和遮擋發生時，也能維持優越的身份一致性與結構清晰度。

English

Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose FaithfulFaces, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared identity aligner that refines and aligns facial poses across distinct views via a pose-shared dictionary and a pose variation-identity invariance constraint. By mapping single-view inputs into a global facial pose representation with explicit Euler angle embeddings, FaithfulFaces provides a pose-faithful facial prior that guides generative foundations toward robust identity-preserving generation. In particular, we develop a specialized pipeline to curate a high-quality video dataset featuring substantial facial pose diversity. Extensive experiments demonstrate that FaithfulFaces achieves state-of-the-art performance, maintaining superior identity consistency and structural clarity even as pose changes and occlusions occur.

FaithfulFaces：用於文字轉影片生成的姿態忠實臉部身份保留

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

摘要

Support