SSR编码器:为主题驱动生成编码选择性主题表示
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
December 26, 2023
作者: Yuxuan Zhang, Jiaming Liu, Yiren Song, Rui Wang, Hao Tang, Jinpeng Yu, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing
cs.AI
摘要
最近在以主题驱动的图像生成方面取得的进展已经实现了零样本生成,然而精确选择和聚焦于关键主题表征仍然具有挑战性。为了解决这一问题,我们引入了SSR-编码器,这是一种新颖的架构,旨在有选择性地从单个或多个参考图像中捕获任何主题。它可以响应包括文本和蒙版在内的各种查询模态,而无需在测试时进行微调。SSR-编码器结合了一个Token-to-Patch对齐器,用于将查询输入与图像补丁对齐,以及一个保留细节的主题编码器,用于提取和保留主题的精细特征,从而生成主题嵌入。这些嵌入与原始文本嵌入一起用于调节生成过程。SSR-编码器以其模型的泛化能力和效率而闻名,可以适应各种自定义模型和控制模块。通过嵌入一致性正则化损失来增强训练,我们广泛的实验表明其在多样化和高质量图像生成中的有效性,表明其具有广泛的适用性。项目页面:https://ssr-encoder.github.io
English
Recent advancements in subject-driven image generation have led to zero-shot
generation, yet precise selection and focus on crucial subject representations
remain challenging. Addressing this, we introduce the SSR-Encoder, a novel
architecture designed for selectively capturing any subject from single or
multiple reference images. It responds to various query modalities including
text and masks, without necessitating test-time fine-tuning. The SSR-Encoder
combines a Token-to-Patch Aligner that aligns query inputs with image patches
and a Detail-Preserving Subject Encoder for extracting and preserving fine
features of the subjects, thereby generating subject embeddings. These
embeddings, used in conjunction with original text embeddings, condition the
generation process. Characterized by its model generalizability and efficiency,
the SSR-Encoder adapts to a range of custom models and control modules.
Enhanced by the Embedding Consistency Regularization Loss for improved
training, our extensive experiments demonstrate its effectiveness in versatile
and high-quality image generation, indicating its broad applicability. Project
page: https://ssr-encoder.github.io