ChatPaper.aiChatPaper

pOps:基於照片啟發的擴散運算子

pOps: Photo-Inspired Diffusion Operators

June 3, 2024
作者: Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or
cs.AI

摘要

文字引導的圖像生成技術使得可以從文字描述中創建視覺內容。然而,某些視覺概念無法僅通過語言有效傳達。這引發了對利用 CLIP 圖像嵌入空間進行更注重視覺的任務的興趣,其中包括 IP-Adapter 等方法。有趣的是,已經證明 CLIP 圖像嵌入空間具有語義意義,其中在此空間內的線性操作會產生語義上有意義的結果。然而,這些操作的具體含義在不同圖像之間可能會不可預測地變化。為了利用這一潛力,我們引入了 pOps,一個在 CLIP 圖像嵌入上直接訓練特定語義運算符的框架。每個 pOps 運算符都建立在預訓練的擴散先驗模型之上。儘管擴散先驗模型最初是為了將文本嵌入和圖像嵌入之間進行映射而進行訓練的,我們展示了它可以調整以適應新的輸入條件,從而產生一個擴散運算符。直接在圖像嵌入上工作不僅提高了我們學習語義操作的能力,還允許我們在需要時直接使用文本 CLIP 損失作為額外監督。我們展示了 pOps 可用於學習各種具有不同語義含義的受照片啟發的運算符,突出了我們提出方法的語義多樣性和潛力。
English
Text-guided image generation enables the creation of visual content from textual descriptions. However, certain visual concepts cannot be effectively conveyed through language alone. This has sparked a renewed interest in utilizing the CLIP image embedding space for more visually-oriented tasks through methods such as IP-Adapter. Interestingly, the CLIP image embedding space has been shown to be semantically meaningful, where linear operations within this space yield semantically meaningful results. Yet, the specific meaning of these operations can vary unpredictably across different images. To harness this potential, we introduce pOps, a framework that trains specific semantic operators directly on CLIP image embeddings. Each pOps operator is built upon a pretrained Diffusion Prior model. While the Diffusion Prior model was originally trained to map between text embeddings and image embeddings, we demonstrate that it can be tuned to accommodate new input conditions, resulting in a diffusion operator. Working directly over image embeddings not only improves our ability to learn semantic operations but also allows us to directly use a textual CLIP loss as an additional supervision when needed. We show that pOps can be used to learn a variety of photo-inspired operators with distinct semantic meanings, highlighting the semantic diversity and potential of our proposed approach.

Summary

AI-Generated Summary

PDF180December 8, 2024