零样本图像编辑与参考模仿

摘要

图像编辑是一个实用但具有挑战性的任务，考虑到用户的多样化需求，其中最困难的部分之一是准确描述编辑后的图像应该是什么样子。在这项工作中，我们提出了一种新形式的编辑，称为模仿式编辑，以帮助用户更方便地发挥他们的创造力。具体来说，为了编辑感兴趣的图像区域，用户可以直接从野外参考资料（例如，在线浏览的相关图片）中汲取灵感，而无需处理参考和源图像之间的匹配。这种设计要求系统自动弄清楚如何从参考资料中得出期望的编辑效果。为此，我们提出了一个生成式训练框架，名为MimicBrush，它从视频剪辑中随机选择两个帧，对其中一个帧的一些区域进行遮罩，并学习使用另一个帧的信息恢复被遮罩的区域。通过这种方式，我们的模型，基于扩散先验开发，能够以自监督的方式捕捉不同图像之间的语义对应关系。我们在各种测试案例下实验证明了我们方法的有效性以及其优于现有替代方案的优越性。我们还构建了一个基准来促进进一步的研究。

English

Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like. In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. Concretely, to edit an image region of interest, users are free to directly draw inspiration from some in-the-wild references (e.g., some relative pictures come across online), without having to cope with the fit between the reference and the source. Such a design requires the system to automatically figure out what to expect from the reference to perform the editing. For this purpose, we propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame. That way, our model, developed from a diffusion prior, is able to capture the semantic correspondence between separate images in a self-supervised manner. We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives. We also construct a benchmark to facilitate further research.

零样本图像编辑与参考模仿

Zero-shot Image Editing with Reference Imitation

摘要

Support