效果擦除:面向高质量特效消除的视频对象联合移除与嵌入技术
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing
March 19, 2026
作者: Yang Fu, Yike Zheng, Ziyun Dai, Henghui Ding
cs.AI
摘要
视频目标移除技术旨在消除动态目标物体及其视觉效应(如形变、阴影和反射),同时还原无缝背景。当前基于扩散模型的视频修复与目标移除方法虽能去除目标物体,却常难以清除这些视觉效应并生成连贯背景。除方法局限外,该领域进展还受制于缺乏系统记录不同环境下常见物体效应的综合性数据集。为此,我们推出VOR数据集——一个提供多样化配对视频的大规模数据集,每组包含目标物体带有效应的原始视频、无目标物体及效应的对应视频,以及相应物体掩码。VOR包含6万对来自实拍与合成源的高质量视频对,涵盖五种效应类型,涉及广泛物体类别及复杂的动态多目标场景。基于VOR数据集,我们提出EffectErase方法,这种效应感知的视频目标移除技术通过互逆学习框架将视频物体插入作为逆向辅助任务。该模型包含任务感知区域引导机制,可聚焦受影响区域的学习并实现灵活的任务切换,同时采用插入-移除一致性目标来促进互补行为及效应区域与结构线索的共享定位。经VOR训练后,EffectErase在大量实验中展现出卓越性能,能在多样场景下实现高质量的视频物体效应消除。
English
Video object removal aims to eliminate dynamic target objects and their visual effects, such as deformation, shadows, and reflections, while restoring seamless backgrounds. Recent diffusion-based video inpainting and object removal methods can remove the objects but often struggle to erase these effects and to synthesize coherent backgrounds. Beyond method limitations, progress is further hampered by the lack of a comprehensive dataset that systematically captures common object effects across varied environments for training and evaluation. To address this, we introduce VOR (Video Object Removal), a large-scale dataset that provides diverse paired videos, each consisting of one video where the target object is present with its effects and a counterpart where the object and effects are absent, with corresponding object masks. VOR contains 60K high-quality video pairs from captured and synthetic sources, covers five effects types, and spans a wide range of object categories as well as complex, dynamic multi-object scenes. Building on VOR, we propose EffectErase, an effect-aware video object removal method that treats video object insertion as the inverse auxiliary task within a reciprocal learning scheme. The model includes task-aware region guidance that focuses learning on affected areas and enables flexible task switching. Then, an insertion-removal consistency objective that encourages complementary behaviors and shared localization of effect regions and structural cues. Trained on VOR, EffectErase achieves superior performance in extensive experiments, delivering high-quality video object effect erasing across diverse scenarios.