ChatPaper.aiChatPaper

零樣本圖像編輯與參考模仿

Zero-shot Image Editing with Reference Imitation

June 11, 2024
作者: Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao
cs.AI

摘要

圖像編輯是一項實用但具有挑戰性的任務,考慮到用戶的多樣需求,其中最困難的部分之一是準確描述編輯後的圖像應該是什麼樣子。在這項工作中,我們提出了一種新形式的編輯,稱為模仿式編輯,以幫助用戶更方便地發揮創造力。具體而言,為了編輯感興趣的圖像區域,用戶可以直接從一些在野外參考資料(例如,在線上偶然遇到的一些相關圖片)中汲取靈感,而無需擔心參考資料與來源之間的契合度。這種設計要求系統自動理解如何從參考資料中進行編輯。為此,我們提出了一個生成式訓練框架,名為MimicBrush,該框架從視頻剪輯中隨機選擇兩個幀,對其中一個幀的某些區域進行遮罩,並學習使用另一個幀的信息來恢復被遮罩的區域。通過這種方式,我們的模型,從擴散先驗中發展而來,能夠以自監督的方式捕捉分離圖像之間的語義對應。我們在各種測試案例下實驗性地展示了我們方法的有效性以及其優於現有替代方案的優越性。我們還構建了一個基準來促進進一步的研究。
English
Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like. In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. Concretely, to edit an image region of interest, users are free to directly draw inspiration from some in-the-wild references (e.g., some relative pictures come across online), without having to cope with the fit between the reference and the source. Such a design requires the system to automatically figure out what to expect from the reference to perform the editing. For this purpose, we propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame. That way, our model, developed from a diffusion prior, is able to capture the semantic correspondence between separate images in a self-supervised manner. We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives. We also construct a benchmark to facilitate further research.

Summary

AI-Generated Summary

PDF343December 8, 2024