OmnimatteZero: 사전 학습된 비디오 확산 모델을 활용한 학습 없이 실시간으로 구현 가능한 Omnimatte

초록

Omnimatte는 주어진 비디오를 의미론적으로 중요한 레이어로 분해하는 것을 목표로 하며, 여기에는 배경과 그림자 및 반사와 같은 관련 효과를 포함한 개별 객체들이 포함됩니다. 기존의 방법들은 종종 광범위한 훈련이나 비용이 많이 드는 자기 지도 최적화를 필요로 합니다. 본 논문에서는 Omnimatte를 위해 사전 훈련된 비디오 확산 모델을 활용하는 훈련이 필요 없는 접근 방식인 OmnimatteZero를 제시합니다. 이 방법은 비디오에서 객체를 제거하고, 개별 객체 레이어와 그 효과를 추출하며, 새로운 비디오에 해당 객체를 합성할 수 있습니다. 이를 위해 제로샷 이미지 인페인팅 기술을 비디오 객체 제거 작업에 적용하였으며, 이 기술은 기본적으로 효과적으로 처리하지 못하는 문제를 해결합니다. 그런 다음, 자기 주의 맵이 객체와 그 흔적에 대한 정보를 포착한다는 것을 보여주고, 이를 사용하여 객체의 효과를 인페인팅하여 깔끔한 배경을 남깁니다. 또한, 간단한 잠재 연산을 통해 객체 레이어를 분리하고 새로운 비디오 레이어와 원활하게 재결합하여 새로운 비디오를 생성할 수 있습니다. 평가 결과, OmnimatteZero는 배경 재구성 측면에서 우수한 성능을 달성할 뿐만 아니라, 최소한의 프레임 실행 시간으로 실시간 성능을 달성하여 가장 빠른 Omnimatte 접근 방식으로 새로운 기록을 세웠습니다.

English

Omnimatte aims to decompose a given video into semantically meaningful layers, including the background and individual objects along with their associated effects, such as shadows and reflections. Existing methods often require extensive training or costly self-supervised optimization. In this paper, we present OmnimatteZero, a training-free approach that leverages off-the-shelf pre-trained video diffusion models for omnimatte. It can remove objects from videos, extract individual object layers along with their effects, and composite those objects onto new videos. We accomplish this by adapting zero-shot image inpainting techniques for video object removal, a task they fail to handle effectively out-of-the-box. We then show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background. Additionally, through simple latent arithmetic, object layers can be isolated and recombined seamlessly with new video layers to produce new videos. Evaluations show that OmnimatteZero not only achieves superior performance in terms of background reconstruction but also sets a new record for the fastest Omnimatte approach, achieving real-time performance with minimal frame runtime.

OmnimatteZero: 사전 학습된 비디오 확산 모델을 활용한 학습 없이 실시간으로 구현 가능한 Omnimatte

OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models

초록

Support