ActAnywhere: 주체 인식 비디오 배경 생성

초록

전경 피사체의 움직임에 맞춰 비디오 배경을 생성하는 것은 영화 산업과 시각 효과 커뮤니티에서 중요한 문제입니다. 이 작업은 전경 피사체의 움직임과 외관에 맞춰 배경을 합성하면서도 아티스트의 창의적 의도를 준수하는 것을 포함합니다. 우리는 이 과정을 자동화하는 생성 모델인 ActAnywhere를 소개합니다. 이 모델은 전통적으로 수작업이 필요한 과정을 대체하며, 대규모 비디오 확산 모델의 힘을 활용하여 이 작업에 특화되었습니다. ActAnywhere는 전경 피사체의 분할 시퀀스를 입력으로 받고, 원하는 장면을 설명하는 이미지를 조건으로 사용하여 조건 프레임을 준수하면서도 현실적인 전경-배경 상호작용을 가진 일관된 비디오를 생성합니다. 우리는 인간-장면 상호작용 비디오로 구성된 대규모 데이터셋에서 모델을 학습시켰습니다. 광범위한 평가를 통해 우리 모델의 우수한 성능을 입증했으며, 기준선 모델을 크게 능가하는 결과를 보였습니다. 또한, ActAnywhere는 비인간 피사체를 포함한 다양한 분포 외 샘플에도 일반화되는 것을 보여줍니다. 자세한 내용은 프로젝트 웹페이지(https://actanywhere.github.io)를 방문해 주세요.

English

Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention. We introduce ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts. Our model leverages the power of large-scale video diffusion models, and is specifically tailored for this task. ActAnywhere takes a sequence of foreground subject segmentation as input and an image that describes the desired scene as condition, to produce a coherent video with realistic foreground-background interactions while adhering to the condition frame. We train our model on a large-scale dataset of human-scene interaction videos. Extensive evaluations demonstrate the superior performance of our model, significantly outperforming baselines. Moreover, we show that ActAnywhere generalizes to diverse out-of-distribution samples, including non-human subjects. Please visit our project webpage at https://actanywhere.github.io.

ActAnywhere: 주체 인식 비디오 배경 생성

ActAnywhere: Subject-Aware Video Background Generation

초록

Support