Durian：基于双重参考引导的肖像动画与属性迁移

摘要

我们提出了Durian，这是首个能够以零样本方式从给定参考图像向目标肖像进行面部属性迁移并生成动画视频的方法。为了实现跨帧的高保真度和空间一致性属性迁移，我们引入了双参考网络，将来自肖像和属性图像的空间特征注入扩散模型的去噪过程中。我们采用自重建公式训练模型，即从同一肖像视频中采样两帧：一帧作为属性参考，另一帧作为目标肖像，其余帧则基于这些输入及其对应的掩码进行重建。为了支持不同空间范围的属性迁移，我们提出了一种基于关键点条件图像生成的掩码扩展策略用于训练。此外，我们进一步通过空间和外观级别的变换增强属性和肖像图像，以提高它们之间位置错位的鲁棒性。这些策略使得模型能够在多样属性和真实世界参考组合上有效泛化，尽管训练过程中没有显式的三元组监督。Durian在带有属性迁移的肖像动画任务上达到了最先进的性能，特别是其双参考设计使得在单次生成过程中无需额外训练即可实现多属性组合。

English

We present Durian, the first method for generating portrait animation videos with facial attribute transfer from a given reference image to a target portrait in a zero-shot manner. To enable high-fidelity and spatially consistent attribute transfer across frames, we introduce dual reference networks that inject spatial features from both the portrait and attribute images into the denoising process of a diffusion model. We train the model using a self-reconstruction formulation, where two frames are sampled from the same portrait video: one is treated as the attribute reference and the other as the target portrait, and the remaining frames are reconstructed conditioned on these inputs and their corresponding masks. To support the transfer of attributes with varying spatial extent, we propose a mask expansion strategy using keypoint-conditioned image generation for training. In addition, we further augment the attribute and portrait images with spatial and appearance-level transformations to improve robustness to positional misalignment between them. These strategies allow the model to effectively generalize across diverse attributes and in-the-wild reference combinations, despite being trained without explicit triplet supervision. Durian achieves state-of-the-art performance on portrait animation with attribute transfer, and notably, its dual reference design enables multi-attribute composition in a single generation pass without additional training.

Durian：基于双重参考引导的肖像动画与属性迁移

Durian: Dual Reference-guided Portrait Animation with Attribute Transfer

摘要

Support