ChatPaper.aiChatPaper

ROSE:视频中附带效应的物体移除技术

ROSE: Remove Objects with Side Effects in Videos

August 26, 2025
作者: Chenxuan Miao, Yutong Feng, Jianshu Zeng, Zixiang Gao, Hantang Liu, Yunfeng Yan, Donglian Qi, Xi Chen, Bin Wang, Hengshuang Zhao
cs.AI

摘要

得益于视频生成模型的最新成功,视频对象移除技术已取得显著进展。然而,在处理对象带来的副作用,如阴影和反射时,现有方法因缺乏成对视频数据作为监督而难以彻底消除这些影响。本文提出了ROSE(移除对象及其副作用)框架,系统性地研究了对象对环境的五种常见影响:阴影、反射、光照、半透明和镜面效应。鉴于收集展示上述效应的成对视频数据面临挑战,我们利用3D渲染引擎生成合成数据。我们精心构建了一个全自动数据准备流程,模拟出包含多样场景、对象、拍摄角度和相机轨迹的大规模成对数据集。ROSE实现为一个基于扩散变换器的视频修复模型。为了定位所有与对象相关的区域,整个视频被输入模型进行基于参考的擦除。此外,引入额外监督以显式预测受副作用影响的区域,这些区域可通过成对视频间的差异掩码揭示。为了全面评估模型在各种副作用移除上的表现,我们提出了一个新的基准测试ROSE-Bench,包含常见场景及五种特殊副作用,用于综合评估。实验结果表明,ROSE在视频对象擦除任务上优于现有模型,并能很好地泛化到真实世界视频场景中。项目页面请访问:https://rose2025-inpaint.github.io/。
English
Video object removal has achieved advanced performance due to the recent success of video generative models. However, when addressing the side effects of objects, e.g., their shadows and reflections, existing works struggle to eliminate these effects for the scarcity of paired video data as supervision. This paper presents ROSE, termed Remove Objects with Side Effects, a framework that systematically studies the object's effects on environment, which can be categorized into five common cases: shadows, reflections, light, translucency and mirror. Given the challenges of curating paired videos exhibiting the aforementioned effects, we leverage a 3D rendering engine for synthetic data generation. We carefully construct a fully-automatic pipeline for data preparation, which simulates a large-scale paired dataset with diverse scenes, objects, shooting angles, and camera trajectories. ROSE is implemented as an video inpainting model built on diffusion transformer. To localize all object-correlated areas, the entire video is fed into the model for reference-based erasing. Moreover, additional supervision is introduced to explicitly predict the areas affected by side effects, which can be revealed through the differential mask between the paired videos. To fully investigate the model performance on various side effect removal, we presents a new benchmark, dubbed ROSE-Bench, incorporating both common scenarios and the five special side effects for comprehensive evaluation. Experimental results demonstrate that ROSE achieves superior performance compared to existing video object erasing models and generalizes well to real-world video scenarios. The project page is https://rose2025-inpaint.github.io/.
PDF62August 29, 2025