ActAnywhere：面向主体的视频背景生成

摘要

为了电影行业和视觉效果社区，生成与前景主体运动相适应的视频背景是一个重要问题。这项任务涉及合成与前景主体运动和外观相一致的背景，同时符合艺术家的创意意图。我们引入了ActAnywhere，这是一个自动化这一传统上需要繁琐手动工作的生成模型。我们的模型利用大规模视频扩散模型的强大功能，专门为这一任务量身定制。ActAnywhere接受一系列前景主体分割作为输入，以及描述所需场景的图像作为条件，生成一个连贯的视频，其中前景和背景互动逼真，同时遵循条件帧。我们在一个大规模的人-场景互动视频数据集上训练我们的模型。广泛的评估表明我们的模型表现优异，明显优于基准线。此外，我们展示了ActAnywhere可以泛化到各种不同的分布样本，包括非人类主体。请访问我们的项目网页 https://actanywhere.github.io。

English

Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention. We introduce ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts. Our model leverages the power of large-scale video diffusion models, and is specifically tailored for this task. ActAnywhere takes a sequence of foreground subject segmentation as input and an image that describes the desired scene as condition, to produce a coherent video with realistic foreground-background interactions while adhering to the condition frame. We train our model on a large-scale dataset of human-scene interaction videos. Extensive evaluations demonstrate the superior performance of our model, significantly outperforming baselines. Moreover, we show that ActAnywhere generalizes to diverse out-of-distribution samples, including non-human subjects. Please visit our project webpage at https://actanywhere.github.io.

ActAnywhere：面向主体的视频背景生成

ActAnywhere: Subject-Aware Video Background Generation

摘要

Support