SSR-編碼器:為主題驅動生成編碼選擇性主題表示
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
December 26, 2023
作者: Yuxuan Zhang, Jiaming Liu, Yiren Song, Rui Wang, Hao Tang, Jinpeng Yu, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing
cs.AI
摘要
最近在以主題驅動的圖像生成方面取得的進展已經實現了零樣本生成,然而對關鍵主題表示的精確選擇和聚焦仍然具有挑戰性。為了應對這一問題,我們引入了SSR-Encoder,這是一種新穎的架構,旨在有選擇性地從單個或多個參考圖像中捕獲任何主題。它可以回應包括文本和遮罩在內的各種查詢模態,而無需在測試時進行微調。SSR-Encoder結合了一個將查詢輸入與圖像裁剪對齊的Token-to-Patch Aligner,以及一個用於提取和保留主題細節特徵的Detail-Preserving Subject Encoder,從而生成主題嵌入。這些嵌入與原始文本嵌入一起條件化生成過程。SSR-Encoder以其模型的通用性和效率而著稱,它可以適應各種自定義模型和控制模塊。通過嵌入一致性正則化損失來增強訓練,我們的大量實驗證明了它在多樣且高質量圖像生成方面的有效性,表明其廣泛的應用性。項目頁面:https://ssr-encoder.github.io
English
Recent advancements in subject-driven image generation have led to zero-shot
generation, yet precise selection and focus on crucial subject representations
remain challenging. Addressing this, we introduce the SSR-Encoder, a novel
architecture designed for selectively capturing any subject from single or
multiple reference images. It responds to various query modalities including
text and masks, without necessitating test-time fine-tuning. The SSR-Encoder
combines a Token-to-Patch Aligner that aligns query inputs with image patches
and a Detail-Preserving Subject Encoder for extracting and preserving fine
features of the subjects, thereby generating subject embeddings. These
embeddings, used in conjunction with original text embeddings, condition the
generation process. Characterized by its model generalizability and efficiency,
the SSR-Encoder adapts to a range of custom models and control modules.
Enhanced by the Embedding Consistency Regularization Loss for improved
training, our extensive experiments demonstrate its effectiveness in versatile
and high-quality image generation, indicating its broad applicability. Project
page: https://ssr-encoder.github.io