Durian: 属性転移を伴うデュアル参照ガイドによるポートレートアニメーション

要旨

本研究では、Durianを提案します。これは、与えられた参照画像からターゲットポートレートへの顔属性転送をゼロショットで行い、ポートレートアニメーションビデオを生成する初の手法です。高忠実度かつ空間的に一貫したフレーム間属性転送を実現するため、デノイジングプロセスにポートレート画像と属性画像の両方から空間的特徴を注入するデュアルリファレンスネットワークを導入しました。モデルの訓練には自己再構成形式を採用し、同一ポートレートビデオから2フレームをサンプリングします。一方を属性参照、他方をターゲットポートレートとして扱い、残りのフレームをこれらの入力と対応するマスクに基づいて再構成します。空間的範囲が異なる属性の転送をサポートするため、キーポイント条件付き画像生成を用いたマスク拡張戦略を提案しました。さらに、属性画像とポートレート画像に空間的および外観レベルの変換を適用し、位置のずれに対するロバスト性を向上させています。これらの戦略により、明示的なトリプレット監視なしで訓練されたにもかかわらず、モデルは多様な属性と実世界の参照組み合わせに効果的に一般化できます。Durianは、属性転送を伴うポートレートアニメーションにおいて最先端の性能を達成し、特にそのデュアルリファレンス設計により、追加の訓練なしに単一の生成パスで複数属性の合成が可能となっています。

English

We present Durian, the first method for generating portrait animation videos with facial attribute transfer from a given reference image to a target portrait in a zero-shot manner. To enable high-fidelity and spatially consistent attribute transfer across frames, we introduce dual reference networks that inject spatial features from both the portrait and attribute images into the denoising process of a diffusion model. We train the model using a self-reconstruction formulation, where two frames are sampled from the same portrait video: one is treated as the attribute reference and the other as the target portrait, and the remaining frames are reconstructed conditioned on these inputs and their corresponding masks. To support the transfer of attributes with varying spatial extent, we propose a mask expansion strategy using keypoint-conditioned image generation for training. In addition, we further augment the attribute and portrait images with spatial and appearance-level transformations to improve robustness to positional misalignment between them. These strategies allow the model to effectively generalize across diverse attributes and in-the-wild reference combinations, despite being trained without explicit triplet supervision. Durian achieves state-of-the-art performance on portrait animation with attribute transfer, and notably, its dual reference design enables multi-attribute composition in a single generation pass without additional training.

Durian: 属性転移を伴うデュアル参照ガイドによるポートレートアニメーション

Durian: Dual Reference-guided Portrait Animation with Attribute Transfer

要旨

Support