ChatPaper.aiChatPaper

DreamID-V:基於擴散變壓器的高保真人臉交換技術——彌合圖像到視頻的鴻溝

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

January 4, 2026
作者: Xu Guo, Fulong Ye, Xinghui Li, Pengqi Tu, Pengze Zhang, Qichao Sun, Songtao Zhao, Xiangwang Hou, Qian He
cs.AI

摘要

影片人臉交換技術需要將來源身份無縫注入目標影片,同時精準保持原始姿態、表情、光照、背景與動態資訊。現有方法難以在維持時間一致性的前提下,同時兼顧身份相似度與屬性保留。為解決此難題,我們提出完整框架,將影像人臉交換技術的優勢無縫遷移至影片領域。我們首先創建新型資料處理流程SyncID-Pipe,通過預訓練身份錨定影片合成器並結合影像人臉交換模型,構建雙向身份四元組進行顯式監督。基於配對資料,我們提出首個基於擴散轉換器的DreamID-V框架,採用核心的模態感知調控模組來實現多模態條件的判別性注入。同時提出合成至真實的課程學習機制與身份連貫強化學習策略,以提升挑戰性場景下的視覺真實感與身份一致性。針對現有基準資料不足的問題,我們建立IDBench-V綜合基準集,涵蓋多樣化場景。大量實驗表明DreamID-V勝過現有頂尖方法,並展現卓越的泛化能力,可無縫適應各類人臉交換相關任務。
English
Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video while meticulously preserving the original pose, expression, lighting, background, and dynamic information. Existing methods struggle to maintain identity similarity and attribute preservation while preserving temporal consistency. To address the challenge, we propose a comprehensive framework to seamlessly transfer the superiority of Image Face Swapping (IFS) to the video domain. We first introduce a novel data pipeline SyncID-Pipe that pre-trains an Identity-Anchored Video Synthesizer and combines it with IFS models to construct bidirectional ID quadruplets for explicit supervision. Building upon paired data, we propose the first Diffusion Transformer-based framework DreamID-V, employing a core Modality-Aware Conditioning module to discriminatively inject multi-model conditions. Meanwhile, we propose a Synthetic-to-Real Curriculum mechanism and an Identity-Coherence Reinforcement Learning strategy to enhance visual realism and identity consistency under challenging scenarios. To address the issue of limited benchmarks, we introduce IDBench-V, a comprehensive benchmark encompassing diverse scenes. Extensive experiments demonstrate DreamID-V outperforms state-of-the-art methods and further exhibits exceptional versatility, which can be seamlessly adapted to various swap-related tasks.
PDF526February 9, 2026