ゼロショット被写体駆動生成のためのネガティブガイド付き被写体忠実性最適化

要旨

本論文では、ゼロショットの被写体駆動生成において被写体の忠実度を向上させる新しい比較学習フレームワークであるSubject Fidelity Optimization（SFO）を提案する。事前学習段階と同様に拡散損失を使用し、正のターゲットのみに依存する教師ありファインチューニング手法を超えて、SFOは合成された負のターゲットを導入し、ペアワイズ比較を通じてモデルが正のターゲットを優先するよう明示的に誘導する。負のターゲットについては、高価な人間のアノテーションを必要とせず、視覚的およびテキスト的な手がかりを意図的に劣化させることで、特徴的で情報量の多い負のサンプルを自動生成するCondition-Degradation Negative Sampling（CDNS）を提案する。さらに、被写体の詳細が現れる中間ステップにファインチューニングを集中させるため、拡散タイムステップの重み付けを再調整する。大規模な実験により、SFOとCDNSを組み合わせることで、被写体駆動生成ベンチマークにおいて、被写体の忠実度とテキストの整合性の両方でベースラインを大幅に上回ることが実証された。プロジェクトページ：https://subjectfidelityoptimization.github.io/

English

We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Beyond supervised fine-tuning methods that rely only on positive targets and use the diffusion loss as in the pre-training stage, SFO introduces synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically generates distinctive and informative negatives by intentionally degrading visual and textual cues without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus finetuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/

ゼロショット被写体駆動生成のためのネガティブガイド付き被写体忠実性最適化

Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

要旨

Support