ObjectMover:基於視頻先驗的生成式物體移動
ObjectMover: Generative Object Movement with Video Prior
March 11, 2025
作者: Xin Yu, Tianyu Wang, Soo Ye Kim, Paul Guerrero, Xi Chen, Qing Liu, Zhe Lin, Xiaojuan Qi
cs.AI
摘要
看似簡單,將圖像中的物體移動到另一個位置實際上是一項具有挑戰性的圖像編輯任務,需要重新調和光照、根據透視調整姿態、準確填補被遮擋區域,並確保陰影和反射的同步協調,同時保持物體的身份特徵。在本文中,我們提出了ObjectMover,這是一個能夠在極具挑戰性的場景中執行物體移動的生成模型。我們的關鍵洞見是將此任務建模為序列到序列的問題,並微調一個視頻生成模型,以利用其在視頻幀間一致物體生成方面的知識。我們展示了通過這種方法,我們的模型能夠適應複雜的現實世界場景,處理極端的光照調和與物體效果移動。由於缺乏大規模的物體移動數據,我們使用現代遊戲引擎構建了一個數據生成管道,以合成高質量的數據對。我們進一步提出了一種多任務學習策略,使模型能夠在現實世界視頻數據上進行訓練,從而提高模型的泛化能力。通過大量實驗,我們證明ObjectMover取得了卓越的成果,並能很好地適應現實世界場景。
English
Simple as it seems, moving an object to another location within an image is,
in fact, a challenging image-editing task that requires re-harmonizing the
lighting, adjusting the pose based on perspective, accurately filling occluded
regions, and ensuring coherent synchronization of shadows and reflections while
maintaining the object identity. In this paper, we present ObjectMover, a
generative model that can perform object movement in highly challenging scenes.
Our key insight is that we model this task as a sequence-to-sequence problem
and fine-tune a video generation model to leverage its knowledge of consistent
object generation across video frames. We show that with this approach, our
model is able to adjust to complex real-world scenarios, handling extreme
lighting harmonization and object effect movement. As large-scale data for
object movement are unavailable, we construct a data generation pipeline using
a modern game engine to synthesize high-quality data pairs. We further propose
a multi-task learning strategy that enables training on real-world video data
to improve the model generalization. Through extensive experiments, we
demonstrate that ObjectMover achieves outstanding results and adapts well to
real-world scenarios.Summary
AI-Generated Summary