Durian:基于双重参考引导的肖像动画与属性迁移
Durian: Dual Reference-guided Portrait Animation with Attribute Transfer
September 4, 2025
作者: Hyunsoo Cha, Byungjun Kim, Hanbyul Joo
cs.AI
摘要
我们提出了Durian,这是首个能够以零样本方式从给定参考图像向目标肖像进行面部属性迁移并生成动画视频的方法。为了实现跨帧的高保真度和空间一致性属性迁移,我们引入了双参考网络,将来自肖像和属性图像的空间特征注入扩散模型的去噪过程中。我们采用自重建公式训练模型,即从同一肖像视频中采样两帧:一帧作为属性参考,另一帧作为目标肖像,其余帧则基于这些输入及其对应的掩码进行重建。为了支持不同空间范围的属性迁移,我们提出了一种基于关键点条件图像生成的掩码扩展策略用于训练。此外,我们进一步通过空间和外观级别的变换增强属性和肖像图像,以提高它们之间位置错位的鲁棒性。这些策略使得模型能够在多样属性和真实世界参考组合上有效泛化,尽管训练过程中没有显式的三元组监督。Durian在带有属性迁移的肖像动画任务上达到了最先进的性能,特别是其双参考设计使得在单次生成过程中无需额外训练即可实现多属性组合。
English
We present Durian, the first method for generating portrait animation videos
with facial attribute transfer from a given reference image to a target
portrait in a zero-shot manner. To enable high-fidelity and spatially
consistent attribute transfer across frames, we introduce dual reference
networks that inject spatial features from both the portrait and attribute
images into the denoising process of a diffusion model. We train the model
using a self-reconstruction formulation, where two frames are sampled from the
same portrait video: one is treated as the attribute reference and the other as
the target portrait, and the remaining frames are reconstructed conditioned on
these inputs and their corresponding masks. To support the transfer of
attributes with varying spatial extent, we propose a mask expansion strategy
using keypoint-conditioned image generation for training. In addition, we
further augment the attribute and portrait images with spatial and
appearance-level transformations to improve robustness to positional
misalignment between them. These strategies allow the model to effectively
generalize across diverse attributes and in-the-wild reference combinations,
despite being trained without explicit triplet supervision. Durian achieves
state-of-the-art performance on portrait animation with attribute transfer, and
notably, its dual reference design enables multi-attribute composition in a
single generation pass without additional training.