MONKEY:基于键值激活掩码的个性化适配器
MONKEY: Masking ON KEY-Value Activation Adapter for Personalization
October 9, 2025
作者: James Baker
cs.AI
摘要
个性化扩散模型使用户能够生成包含特定主体的新图像,相比仅依赖文本提示提供了更强的控制力。然而,这些模型在仅重现主体图像而忽视文本提示时,往往表现欠佳。我们观察到,一种流行的个性化方法——IP-Adapter在推理过程中自动生成掩码,能够明确地将主体与背景分割开来。我们提出在第二遍处理中利用这一自动生成的掩码来遮蔽图像标记,从而将其限制在主体而非背景上,使得文本提示能够关注图像的其余部分。对于描述地点和场景的文本提示,这种方法生成的图像既能准确呈现主体,又能完美契合提示内容。我们将本方法与几种其他测试时个性化方法进行了对比,发现我们的方法在提示与源图像对齐度上表现出色。
English
Personalizing diffusion models allows users to generate new images that
incorporate a given subject, allowing more control than a text prompt. These
models often suffer somewhat when they end up just recreating the subject
image, and ignoring the text prompt. We observe that one popular method for
personalization, the IP-Adapter automatically generates masks that we
definitively segment the subject from the background during inference. We
propose to use this automatically generated mask on a second pass to mask the
image tokens, thus restricting them to the subject, not the background,
allowing the text prompt to attend to the rest of the image. For text prompts
describing locations and places, this produces images that accurately depict
the subject while definitively matching the prompt. We compare our method to a
few other test time personalization methods, and find our method displays high
prompt and source image alignment.