DreamID-V:基于扩散Transformer实现高保真面部替换的图像到视频桥梁
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
January 4, 2026
作者: Xu Guo, Fulong Ye, Xinghui Li, Pengqi Tu, Pengze Zhang, Qichao Sun, Songtao Zhao, Xiangwang Hou, Qian He
cs.AI
摘要
视频换脸技术旨在将源身份无缝注入目标视频,同时精准保持原始姿态、表情、光照、背景及动态信息。现有方法难以在保持时序一致性的同时兼顾身份相似度与属性保留。为此,我们提出一个综合性框架,将图像换脸技术的优势无缝迁移至视频领域。我们首先设计新型数据流水线SyncID-Pipe,通过预训练身份锚定视频合成器并与图像换脸模型结合,构建双向身份四元组以实现显式监督。基于配对数据,我们提出首个基于扩散变换器的框架DreamID-V,其核心模态感知调节模块能 discriminatively 注入多模态条件。同时,我们提出合成到真实的课程学习机制与身份一致性强化学习策略,以增强复杂场景下的视觉真实感与身份一致性。针对基准数据匮乏的问题,我们构建了涵盖多样场景的综合评测基准IDBench-V。大量实验表明,DreamID-V在性能上超越现有最优方法,并展现出卓越的泛化能力,可无缝适配多种换脸相关任务。
English
Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video while meticulously preserving the original pose, expression, lighting, background, and dynamic information. Existing methods struggle to maintain identity similarity and attribute preservation while preserving temporal consistency. To address the challenge, we propose a comprehensive framework to seamlessly transfer the superiority of Image Face Swapping (IFS) to the video domain. We first introduce a novel data pipeline SyncID-Pipe that pre-trains an Identity-Anchored Video Synthesizer and combines it with IFS models to construct bidirectional ID quadruplets for explicit supervision. Building upon paired data, we propose the first Diffusion Transformer-based framework DreamID-V, employing a core Modality-Aware Conditioning module to discriminatively inject multi-model conditions. Meanwhile, we propose a Synthetic-to-Real Curriculum mechanism and an Identity-Coherence Reinforcement Learning strategy to enhance visual realism and identity consistency under challenging scenarios. To address the issue of limited benchmarks, we introduce IDBench-V, a comprehensive benchmark encompassing diverse scenes. Extensive experiments demonstrate DreamID-V outperforms state-of-the-art methods and further exhibits exceptional versatility, which can be seamlessly adapted to various swap-related tasks.