ChatPaper.aiChatPaper

Over++:面向图层交互特效的生成式视频合成技术

Over++: Generative Video Compositing for Layer Interaction Effects

December 22, 2025
作者: Luchao Qi, Jiaye Wu, Jun Myeong Choi, Cary Phillips, Roni Sengupta, Dan B Goldman
cs.AI

摘要

在专业视频合成流程中,艺术家需手动创建前景主体与背景层之间的环境交互效果——如阴影、倒影、扬尘与飞溅等。现有视频生成模型难以在添加此类效果时保持输入视频的完整性,而当前视频修复方法要么需要逐帧标注的高成本蒙版,要么生成效果失真。我们提出"增强式合成"这一新任务,其能根据文本提示与输入视频层生成逼真的半透明环境特效,同时保留原始场景。针对该任务,我们开发了Over++视频特效生成框架,该框架无需对相机位姿、场景静态性或深度监督做任何假设。我们为此任务构建了配对的特效数据集,并引入保留文本驱动编辑能力的非配对增强策略。该方法还支持可选蒙版控制和关键帧引导,且无需密集标注。尽管训练数据有限,Over++仍能生成多样化且逼真的环境特效,在效果生成与场景保持方面均优于现有基线方法。
English
In professional video compositing workflows, artists must manually create environmental interactions-such as shadows, reflections, dust, and splashes-between foreground subjects and background layers. Existing video generative models struggle to preserve the input video while adding such effects, and current video inpainting methods either require costly per-frame masks or yield implausible results. We introduce augmented compositing, a new task that synthesizes realistic, semi-transparent environmental effects conditioned on text prompts and input video layers, while preserving the original scene. To address this task, we present Over++, a video effect generation framework that makes no assumptions about camera pose, scene stationarity, or depth supervision. We construct a paired effect dataset tailored for this task and introduce an unpaired augmentation strategy that preserves text-driven editability. Our method also supports optional mask control and keyframe guidance without requiring dense annotations. Despite training on limited data, Over++ produces diverse and realistic environmental effects and outperforms existing baselines in both effect generation and scene preservation.
PDF11December 24, 2025