保留源影片真實感:實現電影級畫質的高擬真臉部置換技術
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality
December 8, 2025
作者: Zekai Luo, Zongze Du, Zhouhang Zhu, Hao Zhong, Muzhi Zhu, Wen Wang, Yuling Xi, Chenchen Jing, Hao Chen, Chunhua Shen
cs.AI
摘要
影片人臉交換技術在影視娛樂製作中具有關鍵作用,然而如何在複雜長影片序列中實現高保真度與時間一致性仍是重大挑戰。受近期參考引導圖像編輯技術的啟發,我們探索是否可類似地利用來源影片的豐富視覺屬性,來提升影片人臉交換的逼真度與時間連貫性。基於此洞見,本研究提出首個影片參考引導的人臉交換模型LivingSwap。我們採用關鍵幀作為條件信號來注入目標身份特徵,實現靈活可控的編輯。通過結合關鍵幀條件與影片參考引導,模型能進行時間縫合處理,確保長影片序列中穩定的身份保持與高保真重建。為解決參考引導訓練數據稀缺的問題,我們構建了配對人臉交換數據集Face2Face,並通過反轉數據對確保可靠的基準真值監督。大量實驗證明,我們的方法可實現業界頂尖效果,無縫融合目標身份與來源影片的表情、光影和動作,同時大幅減少製作流程中的人工操作。項目頁面:https://aim-uofa.github.io/LivingSwap
English
Video face swapping is crucial in film and entertainment production, where achieving high fidelity and temporal consistency over long and complex video sequences remains a significant challenge. Inspired by recent advances in reference-guided image editing, we explore whether rich visual attributes from source videos can be similarly leveraged to enhance both fidelity and temporal coherence in video face swapping. Building on this insight, this work presents LivingSwap, the first video reference guided face swapping model. Our approach employs keyframes as conditioning signals to inject the target identity, enabling flexible and controllable editing. By combining keyframe conditioning with video reference guidance, the model performs temporal stitching to ensure stable identity preservation and high-fidelity reconstruction across long video sequences. To address the scarcity of data for reference-guided training, we construct a paired face-swapping dataset, Face2Face, and further reverse the data pairs to ensure reliable ground-truth supervision. Extensive experiments demonstrate that our method achieves state-of-the-art results, seamlessly integrating the target identity with the source video's expressions, lighting, and motion, while significantly reducing manual effort in production workflows. Project webpage: https://aim-uofa.github.io/LivingSwap