제로샷 주체 기반 생성을 위한 부정적 가이드 주체 충실도 최적화

초록

우리는 제로샷 주체 기반 생성에서 주체 충실도를 향상시키는 새로운 비교 학습 프레임워크인 Subject Fidelity Optimization(SFO)을 제안한다. 사전 학습 단계에서와 같이 확산 손실만을 사용하고 양성 타겟에만 의존하는 지도 미세 조정 방법을 넘어, SFO는 합성 음성 타겟을 도입하고 쌍별 비교를 통해 모델이 음성보다 양성을 선호하도록 명시적으로 안내한다. 음성 타겟을 위해, 우리는 비용이 많이 드는 인간 주석 없이도 시각적 및 텍스트적 단서를 의도적으로 저하시켜 독특하고 유익한 음성을 자동으로 생성하는 Condition-Degradation Negative Sampling(CDNS)을 제안한다. 또한, 주체 세부 사항이 나타나는 중간 단계에 미세 조정을 집중하기 위해 확산 시간 단계를 재조정한다. 광범위한 실험을 통해 SFO와 CDNS가 주체 기반 생성 벤치마크에서 주체 충실도와 텍스트 정렬 모두에서 기준선을 크게 능가함을 입증한다. 프로젝트 페이지: https://subjectfidelityoptimization.github.io/

English

We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Beyond supervised fine-tuning methods that rely only on positive targets and use the diffusion loss as in the pre-training stage, SFO introduces synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically generates distinctive and informative negatives by intentionally degrading visual and textual cues without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus finetuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/

제로샷 주체 기반 생성을 위한 부정적 가이드 주체 충실도 최적화

Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

초록

Support