Durian：基於雙參考的肖像動畫與屬性遷移

摘要

我們提出了Durian，這是首個能夠以零樣本方式從給定參考圖像向目標肖像進行面部屬性遷移並生成肖像動畫視頻的方法。為實現跨幀的高保真度和空間一致性屬性遷移，我們引入了雙參考網絡，該網絡將來自肖像和屬性圖像的空間特徵注入擴散模型的去噪過程中。我們採用自重建公式訓練模型，其中從同一肖像視頻中採樣兩幀：一幀作為屬性參考，另一幀作為目標肖像，其餘幀則基於這些輸入及其對應的遮罩進行重建。為支持不同空間範圍的屬性遷移，我們提出了一種基於關鍵點條件圖像生成的遮罩擴展策略用於訓練。此外，我們還通過空間和外觀層面的變換進一步增強了屬性和肖像圖像，以提高對它們之間位置錯位的魯棒性。這些策略使得模型能夠有效地泛化到多樣化的屬性和真實世界中的參考組合，儘管在訓練時並未使用顯式的三元組監督。Durian在帶有屬性遷移的肖像動畫任務上達到了最先進的性能，特別是其雙參考設計使得在單次生成過程中無需額外訓練即可實現多屬性組合。

English

We present Durian, the first method for generating portrait animation videos with facial attribute transfer from a given reference image to a target portrait in a zero-shot manner. To enable high-fidelity and spatially consistent attribute transfer across frames, we introduce dual reference networks that inject spatial features from both the portrait and attribute images into the denoising process of a diffusion model. We train the model using a self-reconstruction formulation, where two frames are sampled from the same portrait video: one is treated as the attribute reference and the other as the target portrait, and the remaining frames are reconstructed conditioned on these inputs and their corresponding masks. To support the transfer of attributes with varying spatial extent, we propose a mask expansion strategy using keypoint-conditioned image generation for training. In addition, we further augment the attribute and portrait images with spatial and appearance-level transformations to improve robustness to positional misalignment between them. These strategies allow the model to effectively generalize across diverse attributes and in-the-wild reference combinations, despite being trained without explicit triplet supervision. Durian achieves state-of-the-art performance on portrait animation with attribute transfer, and notably, its dual reference design enables multi-attribute composition in a single generation pass without additional training.

Durian：基於雙參考的肖像動畫與屬性遷移

Durian: Dual Reference-guided Portrait Animation with Attribute Transfer

摘要

Support