ChatPaper.aiChatPaper

pOps:基于照片启发的扩散算子

pOps: Photo-Inspired Diffusion Operators

June 3, 2024
作者: Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or
cs.AI

摘要

文本引导的图像生成使得可以从文本描述中创建视觉内容。然而,某些视觉概念无法通过单纯的语言有效传达。这引发了对利用CLIP图像嵌入空间进行更加视觉导向任务的方法(如IP-Adapter)的重新兴趣。有趣的是,已经证明CLIP图像嵌入空间具有语义意义,其中在该空间内的线性操作会产生语义上有意义的结果。然而,这些操作的具体含义在不同图像之间可能会变化不可预测。为了利用这一潜力,我们引入了pOps,这是一个在CLIP图像嵌入上直接训练特定语义操作符的框架。每个pOps操作符都建立在一个预训练的扩散先验模型之上。虽然扩散先验模型最初是用于将文本嵌入和图像嵌入之间进行映射训练的,我们展示了它可以被调整以适应新的输入条件,从而产生一个扩散操作符。直接在图像嵌入上工作不仅提高了我们学习语义操作的能力,还使我们能够在需要时直接使用文本CLIP损失作为额外的监督。我们展示了pOps可以用于学习各种受照片启发的操作符,具有不同的语义含义,突显了我们提出方法的语义多样性和潜力。
English
Text-guided image generation enables the creation of visual content from textual descriptions. However, certain visual concepts cannot be effectively conveyed through language alone. This has sparked a renewed interest in utilizing the CLIP image embedding space for more visually-oriented tasks through methods such as IP-Adapter. Interestingly, the CLIP image embedding space has been shown to be semantically meaningful, where linear operations within this space yield semantically meaningful results. Yet, the specific meaning of these operations can vary unpredictably across different images. To harness this potential, we introduce pOps, a framework that trains specific semantic operators directly on CLIP image embeddings. Each pOps operator is built upon a pretrained Diffusion Prior model. While the Diffusion Prior model was originally trained to map between text embeddings and image embeddings, we demonstrate that it can be tuned to accommodate new input conditions, resulting in a diffusion operator. Working directly over image embeddings not only improves our ability to learn semantic operations but also allows us to directly use a textual CLIP loss as an additional supervision when needed. We show that pOps can be used to learn a variety of photo-inspired operators with distinct semantic meanings, highlighting the semantic diversity and potential of our proposed approach.

Summary

AI-Generated Summary

PDF180December 8, 2024