拖曳您的 GAN:在生成式影像流形上進行互動式基於點的操作
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
May 18, 2023
作者: Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt
cs.AI
摘要
合成符合使用者需求的視覺內容通常需要對生成物件的姿勢、形狀、表情和佈局進行靈活且精確的可控性。現有方法通常通過手動標註的訓練數據或先前的3D模型來獲得生成對抗網絡(GANs)的可控性,但這些方法往往缺乏靈活性、精確性和通用性。在本研究中,我們探討了一種強大但較少被探索的控制GANs的方式,即以用戶互動的方式“拖曳”圖像中的任何點以精確到達目標點,如圖1所示。為了實現這一點,我們提出了DragGAN,它包括兩個主要組件:1)基於特徵的運動監督,驅使控制點向目標位置移動,以及2)一種新的點跟踪方法,利用辨識生成器特徵來持續定位控制點的位置。通過DragGAN,任何人都可以通過精確控制像素的位置來變形圖像,從而操縱動物、汽車、人類、風景等各種類別的姿勢、形狀、表情和佈局。由於這些操作是在GAN的學習生成圖像流形上進行的,因此即使對於挑戰性情景(如幻覺遮擋內容和變形形狀始終遵循對象的剛性)也往往能產生逼真的輸出。定性和定量比較顯示DragGAN在圖像操作和點跟踪任務中相對於先前方法的優勢。我們還展示了通過GAN反演對真實圖像進行操作的示例。
English
Synthesizing visual content that meets users' needs often requires flexible
and precise controllability of the pose, shape, expression, and layout of the
generated objects. Existing approaches gain controllability of generative
adversarial networks (GANs) via manually annotated training data or a prior 3D
model, which often lack flexibility, precision, and generality. In this work,
we study a powerful yet much less explored way of controlling GANs, that is, to
"drag" any points of the image to precisely reach target points in a
user-interactive manner, as shown in Fig.1. To achieve this, we propose
DragGAN, which consists of two main components: 1) a feature-based motion
supervision that drives the handle point to move towards the target position,
and 2) a new point tracking approach that leverages the discriminative
generator features to keep localizing the position of the handle points.
Through DragGAN, anyone can deform an image with precise control over where
pixels go, thus manipulating the pose, shape, expression, and layout of diverse
categories such as animals, cars, humans, landscapes, etc. As these
manipulations are performed on the learned generative image manifold of a GAN,
they tend to produce realistic outputs even for challenging scenarios such as
hallucinating occluded content and deforming shapes that consistently follow
the object's rigidity. Both qualitative and quantitative comparisons
demonstrate the advantage of DragGAN over prior approaches in the tasks of
image manipulation and point tracking. We also showcase the manipulation of
real images through GAN inversion.