ROSE:消除視頻中具有副作用的物體
ROSE: Remove Objects with Side Effects in Videos
August 26, 2025
作者: Chenxuan Miao, Yutong Feng, Jianshu Zeng, Zixiang Gao, Hantang Liu, Yunfeng Yan, Donglian Qi, Xi Chen, Bin Wang, Hengshuang Zhao
cs.AI
摘要
由於近期視頻生成模型的成功,視頻物體移除技術已取得顯著進展。然而,在處理物體所產生的副作用(如陰影和反射)時,現有方法因缺乏配對視頻數據作為監督而難以有效消除這些影響。本文提出了ROSE(移除帶有副作用的物體),這是一個系統研究物體對環境影響的框架,這些影響可分為五種常見情況:陰影、反射、光線、半透明和鏡像。考慮到收集展示上述效果的配對視頻的挑戰,我們利用3D渲染引擎生成合成數據。我們精心構建了一個全自動的數據準備流程,模擬出包含多樣場景、物體、拍攝角度和相機軌跡的大規模配對數據集。ROSE實現為基於擴散變換器的視頻修復模型。為了定位所有與物體相關的區域,整個視頻被輸入模型進行基於參考的擦除。此外,引入了額外的監督來顯式預測受副作用影響的區域,這些區域可通過配對視頻之間的差異掩碼揭示。為了全面評估模型在各種副作用移除上的性能,我們提出了一個新的基準測試,名為ROSE-Bench,涵蓋了常見場景及上述五種特殊副作用,以進行綜合評估。實驗結果表明,ROSE相比現有的視頻物體擦除模型表現出更優的性能,並能很好地泛化到真實世界的視頻場景中。項目頁面請訪問:https://rose2025-inpaint.github.io/。
English
Video object removal has achieved advanced performance due to the recent
success of video generative models. However, when addressing the side effects
of objects, e.g., their shadows and reflections, existing works struggle to
eliminate these effects for the scarcity of paired video data as supervision.
This paper presents ROSE, termed Remove Objects with Side Effects, a framework
that systematically studies the object's effects on environment, which can be
categorized into five common cases: shadows, reflections, light, translucency
and mirror. Given the challenges of curating paired videos exhibiting the
aforementioned effects, we leverage a 3D rendering engine for synthetic data
generation. We carefully construct a fully-automatic pipeline for data
preparation, which simulates a large-scale paired dataset with diverse scenes,
objects, shooting angles, and camera trajectories. ROSE is implemented as an
video inpainting model built on diffusion transformer. To localize all
object-correlated areas, the entire video is fed into the model for
reference-based erasing. Moreover, additional supervision is introduced to
explicitly predict the areas affected by side effects, which can be revealed
through the differential mask between the paired videos. To fully investigate
the model performance on various side effect removal, we presents a new
benchmark, dubbed ROSE-Bench, incorporating both common scenarios and the five
special side effects for comprehensive evaluation. Experimental results
demonstrate that ROSE achieves superior performance compared to existing video
object erasing models and generalizes well to real-world video scenarios. The
project page is https://rose2025-inpaint.github.io/.