SPF-Portrait: セマンティック汚染のない微調整による純粋なポートレートカスタマイズに向けて

要旨

事前学習済みのText-to-Image（T2I）モデルをカスタムポートレートデータセットでファインチューニングすることは、テキスト駆動によるポートレート属性のカスタマイズにおける主流の手法です。しかし、ファインチューニング中のセマンティック・ポリューション（意味的汚染）により、既存の手法では、ターゲット属性をカスタマイズしながらも、元のモデルの振る舞いを維持し、インクリメンタル学習を達成することが困難です。この問題を解決するため、我々はSPF-Portraitを提案します。これは、テキスト駆動によるポートレートカスタマイズにおいて、セマンティック・ポリューションを排除しつつ、カスタマイズされた意味を純粋に理解するための先駆的な研究です。 SPF-Portraitでは、従来のファインチューニングパスに対して、元のモデルを参照として導入するデュアルパスパイプラインを提案します。コントラスティブ学習を通じて、ターゲット属性への適応を確保し、他の無関係な属性を意図的に元のポートレートと整合させます。さらに、ターゲットセマンティクスの正確な応答領域を表す新しいSemantic-Aware Fine Control Mapを導入し、コントラスティブパス間の整合プロセスを空間的にガイドします。この整合プロセスは、元のモデルの性能を効果的に維持するだけでなく、過剰な整合を回避します。さらに、直接的なクロスモーダル監視に内在する表現の不一致を軽減しつつ、ターゲット属性の性能を強化するための新しい応答強化メカニズムを提案します。大規模な実験により、SPF-Portraitが最先端の性能を達成することが実証されています。プロジェクトのウェブページは以下の通りです：https://spf-portrait.github.io/SPF-Portrait/

English

Fine-tuning a pre-trained Text-to-Image (T2I) model on a tailored portrait dataset is the mainstream method for text-driven customization of portrait attributes. Due to Semantic Pollution during fine-tuning, existing methods struggle to maintain the original model's behavior and achieve incremental learning while customizing target attributes. To address this issue, we propose SPF-Portrait, a pioneering work to purely understand customized semantics while eliminating semantic pollution in text-driven portrait customization. In our SPF-Portrait, we propose a dual-path pipeline that introduces the original model as a reference for the conventional fine-tuning path. Through contrastive learning, we ensure adaptation to target attributes and purposefully align other unrelated attributes with the original portrait. We introduce a novel Semantic-Aware Fine Control Map, which represents the precise response regions of the target semantics, to spatially guide the alignment process between the contrastive paths. This alignment process not only effectively preserves the performance of the original model but also avoids over-alignment. Furthermore, we propose a novel response enhancement mechanism to reinforce the performance of target attributes, while mitigating representation discrepancy inherent in direct cross-modal supervision. Extensive experiments demonstrate that SPF-Portrait achieves state-of-the-art performance. Project webpage: https://spf-portrait.github.io/SPF-Portrait/

SPF-Portrait: セマンティック汚染のない微調整による純粋なポートレートカスタマイズに向けて

SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning

要旨

Support