ROSE：消除視頻中具有副作用的物體

摘要

由於近期視頻生成模型的成功，視頻物體移除技術已取得顯著進展。然而，在處理物體所產生的副作用（如陰影和反射）時，現有方法因缺乏配對視頻數據作為監督而難以有效消除這些影響。本文提出了ROSE（移除帶有副作用的物體），這是一個系統研究物體對環境影響的框架，這些影響可分為五種常見情況：陰影、反射、光線、半透明和鏡像。考慮到收集展示上述效果的配對視頻的挑戰，我們利用3D渲染引擎生成合成數據。我們精心構建了一個全自動的數據準備流程，模擬出包含多樣場景、物體、拍攝角度和相機軌跡的大規模配對數據集。ROSE實現為基於擴散變換器的視頻修復模型。為了定位所有與物體相關的區域，整個視頻被輸入模型進行基於參考的擦除。此外，引入了額外的監督來顯式預測受副作用影響的區域，這些區域可通過配對視頻之間的差異掩碼揭示。為了全面評估模型在各種副作用移除上的性能，我們提出了一個新的基準測試，名為ROSE-Bench，涵蓋了常見場景及上述五種特殊副作用，以進行綜合評估。實驗結果表明，ROSE相比現有的視頻物體擦除模型表現出更優的性能，並能很好地泛化到真實世界的視頻場景中。項目頁面請訪問：https://rose2025-inpaint.github.io/。

English

Video object removal has achieved advanced performance due to the recent success of video generative models. However, when addressing the side effects of objects, e.g., their shadows and reflections, existing works struggle to eliminate these effects for the scarcity of paired video data as supervision. This paper presents ROSE, termed Remove Objects with Side Effects, a framework that systematically studies the object's effects on environment, which can be categorized into five common cases: shadows, reflections, light, translucency and mirror. Given the challenges of curating paired videos exhibiting the aforementioned effects, we leverage a 3D rendering engine for synthetic data generation. We carefully construct a fully-automatic pipeline for data preparation, which simulates a large-scale paired dataset with diverse scenes, objects, shooting angles, and camera trajectories. ROSE is implemented as an video inpainting model built on diffusion transformer. To localize all object-correlated areas, the entire video is fed into the model for reference-based erasing. Moreover, additional supervision is introduced to explicitly predict the areas affected by side effects, which can be revealed through the differential mask between the paired videos. To fully investigate the model performance on various side effect removal, we presents a new benchmark, dubbed ROSE-Bench, incorporating both common scenarios and the five special side effects for comprehensive evaluation. Experimental results demonstrate that ROSE achieves superior performance compared to existing video object erasing models and generalizes well to real-world video scenarios. The project page is https://rose2025-inpaint.github.io/.

ROSE：消除視頻中具有副作用的物體

ROSE: Remove Objects with Side Effects in Videos

摘要

Support