神奇修復:通過觀看動態視頻來優化照片編輯
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
March 19, 2024
作者: Hadi Alzayer, Zhihao Xia, Xuaner Zhang, Eli Shechtman, Jia-Bin Huang, Michael Gharbi
cs.AI
摘要
我們提出了一個生成模型,根據一張粗糙編輯的圖像,合成一個遵循指定佈局的照片逼真的輸出。我們的方法從原始圖像轉移細節,保留其部分特徵,同時適應新佈局定義的照明和背景。我們的關鍵洞察是視頻是這個任務的一個強大監督來源:物體和相機運動提供了許多觀察,展示了世界如何隨著視角、照明和物理交互而變化。我們構建了一個圖像數據集,其中每個樣本是從同一視頻中隨機選擇的時間間隔提取的源幀和目標幀對。我們使用兩個模擬預期的測試時間用戶編輯的運動模型將源幀向目標幀進行變形。我們監督我們的模型將變形圖像轉換為地面真相,從預先訓練的擴散模型開始。我們的模型設計明確地實現了從源幀到生成圖像的細節轉移,同時緊密遵循用戶指定的佈局。我們展示通過使用簡單的分割和粗糙的2D操作,我們可以合成一個忠於用戶輸入的逼真編輯,同時解決諸如協調照明和編輯對象之間的物理交互等二階效應。
English
We propose a generative model that, given a coarsely edited image,
synthesizes a photorealistic output that follows the prescribed layout. Our
method transfers fine details from the original image and preserves the
identity of its parts. Yet, it adapts it to the lighting and context defined by
the new layout. Our key insight is that videos are a powerful source of
supervision for this task: objects and camera motions provide many observations
of how the world changes with viewpoint, lighting, and physical interactions.
We construct an image dataset in which each sample is a pair of source and
target frames extracted from the same video at randomly chosen time intervals.
We warp the source frame toward the target using two motion models that mimic
the expected test-time user edits. We supervise our model to translate the
warped image into the ground truth, starting from a pretrained diffusion model.
Our model design explicitly enables fine detail transfer from the source frame
to the generated image, while closely following the user-specified layout. We
show that by using simple segmentations and coarse 2D manipulations, we can
synthesize a photorealistic edit faithful to the user's input while addressing
second-order effects like harmonizing the lighting and physical interactions
between edited objects.Summary
AI-Generated Summary