ChatPaper.aiChatPaper

EchoVideo:通過多模態特徵融合實現保護身份的人類視頻生成

EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion

January 23, 2025
作者: Jiangchuan Wei, Shiyue Yan, Wenfeng Lin, Boyuan Liu, Renjie Chen, Mingyu Guo
cs.AI

摘要

最近在影片生成領域的進展對各種下游應用產生了顯著影響,特別是在保護身份的影片生成(IPT2V)方面。然而,現有方法在處理“複製-粘貼”瑕疵和低相似性問題時遇到困難,主要是由於它們過度依賴低級別的面部圖像信息。這種依賴可能導致面部外觀僵硬和反映無關細節的瑕疵。為應對這些挑戰,我們提出了EchoVideo,它採用兩個關鍵策略:(1)身份圖像-文本融合模塊(IITF),從文本中集成高級語義特徵,捕捉清晰的面部身份表示,同時丟棄遮擋、姿勢和光線變化,以避免引入瑕疵;(2)兩階段訓練策略,第二階段採用隨機方法,隨機利用淺層面部信息。其目標是在增強淺層特徵所提供的保真度的同時,減輕對它們的過度依賴。該策略鼓勵模型在訓練過程中利用高級特徵,從而最終培養出更強大的面部身份表示。EchoVideo有效地保留了面部身份並保持了全身完整性。大量實驗證明,它在生成高質量、可控性和保真度的影片方面取得了出色的結果。
English
Recent advancements in video generation have significantly impacted various downstream applications, particularly in identity-preserving video generation (IPT2V). However, existing methods struggle with "copy-paste" artifacts and low similarity issues, primarily due to their reliance on low-level facial image information. This dependence can result in rigid facial appearances and artifacts reflecting irrelevant details. To address these challenges, we propose EchoVideo, which employs two key strategies: (1) an Identity Image-Text Fusion Module (IITF) that integrates high-level semantic features from text, capturing clean facial identity representations while discarding occlusions, poses, and lighting variations to avoid the introduction of artifacts; (2) a two-stage training strategy, incorporating a stochastic method in the second phase to randomly utilize shallow facial information. The objective is to balance the enhancements in fidelity provided by shallow features while mitigating excessive reliance on them. This strategy encourages the model to utilize high-level features during training, ultimately fostering a more robust representation of facial identities. EchoVideo effectively preserves facial identities and maintains full-body integrity. Extensive experiments demonstrate that it achieves excellent results in generating high-quality, controllability and fidelity videos.

Summary

AI-Generated Summary

PDF72January 24, 2025