ActAnywhere：主題感知視訊背景生成

摘要

為了電影業和視覺效果社區，生成符合前景主題運動的視頻背景是一個重要問題。這個任務涉及合成與前景主題運動和外觀相符的背景，同時符合藝術家的創意意圖。我們介紹了ActAnywhere，這是一個能自動化這個過程的生成模型，傳統上需要繁瑣的手動工作。我們的模型利用大規模視頻擴散模型的強大功能，並專門為這個任務量身定制。ActAnywhere接受一系列前景主題分割作為輸入，以及描述所需場景的圖像作為條件，以生成一個連貫的視頻，其中包括真實的前景-背景互動，同時遵循條件幀。我們在一個大規模的人-場景互動視頻數據集上訓練我們的模型。廣泛的評估顯示了我們模型優越的性能，明顯優於基準線。此外，我們展示了ActAnywhere對各種分布之外的樣本具有泛化能力，包括非人類主題。請訪問我們的項目網頁https://actanywhere.github.io。

English

Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention. We introduce ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts. Our model leverages the power of large-scale video diffusion models, and is specifically tailored for this task. ActAnywhere takes a sequence of foreground subject segmentation as input and an image that describes the desired scene as condition, to produce a coherent video with realistic foreground-background interactions while adhering to the condition frame. We train our model on a large-scale dataset of human-scene interaction videos. Extensive evaluations demonstrate the superior performance of our model, significantly outperforming baselines. Moreover, we show that ActAnywhere generalizes to diverse out-of-distribution samples, including non-human subjects. Please visit our project webpage at https://actanywhere.github.io.

ActAnywhere：主題感知視訊背景生成

ActAnywhere: Subject-Aware Video Background Generation

摘要

Support