Reangle-A-Video:四維視頻生成作為視頻到視頻的轉譯
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
March 12, 2025
作者: Hyeonho Jeong, Suhyeon Lee, Jong Chul Ye
cs.AI
摘要
我們介紹了Reangle-A-Video,這是一個從單一輸入視頻生成同步多視角視頻的統一框架。與主流方法不同,後者通常在大規模4D數據集上訓練多視角視頻擴散模型,我們的方法將多視角視頻生成任務重新定義為視頻到視頻的轉換,利用公開可用的圖像和視頻擴散先驗。本質上,Reangle-A-Video分兩個階段運作。(1) 多視角運動學習:以自監督的方式同步微調一個圖像到視頻的擴散變壓器,從一組扭曲的視頻中提取視角不變的運動。(2) 多視角一致的圖像到圖像轉換:在推理時使用DUSt3R進行跨視角一致性指導,將輸入視頻的第一幀扭曲並修補成不同的攝像機視角,生成多視角一致的起始圖像。在靜態視角傳輸和動態攝像機控制上的大量實驗表明,Reangle-A-Video超越了現有方法,為多視角視頻生成提供了一種新的解決方案。我們將公開我們的代碼和數據。項目頁面:https://hyeonho99.github.io/reangle-a-video/
English
We introduce Reangle-A-Video, a unified framework for generating synchronized
multi-view videos from a single input video. Unlike mainstream approaches that
train multi-view video diffusion models on large-scale 4D datasets, our method
reframes the multi-view video generation task as video-to-videos translation,
leveraging publicly available image and video diffusion priors. In essence,
Reangle-A-Video operates in two stages. (1) Multi-View Motion Learning: An
image-to-video diffusion transformer is synchronously fine-tuned in a
self-supervised manner to distill view-invariant motion from a set of warped
videos. (2) Multi-View Consistent Image-to-Images Translation: The first frame
of the input video is warped and inpainted into various camera perspectives
under an inference-time cross-view consistency guidance using DUSt3R,
generating multi-view consistent starting images. Extensive experiments on
static view transport and dynamic camera control show that Reangle-A-Video
surpasses existing methods, establishing a new solution for multi-view video
generation. We will publicly release our code and data. Project page:
https://hyeonho99.github.io/reangle-a-video/Summary
AI-Generated Summary