ChatPaper.aiChatPaper

CamViG:具備多模態Transformer的相機感知影像到視頻生成

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

May 21, 2024
作者: Andrew Marmon, Grant Schindler, José Lezama, Dan Kondratyuk, Bryan Seybold, Irfan Essa
cs.AI

摘要

我們將多模態Transformer擴展,以包含3D攝影機運動作為生成視頻任務的條件信號。生成式視頻模型變得日益強大,因此研究工作集中在控制這些模型輸出的方法上。我們提議通過在生成的視頻上將虛擬3D攝影機控制條件添加到生成式視頻方法中,條件是在生成的視頻過程中對三維攝影機運動的編碼。結果表明,我們能夠成功控制視頻生成過程中的攝影機,從單幀和攝影機信號開始,並且我們展示了使用傳統計算機視覺方法驗證生成的3D攝影機路徑的準確性。
English
We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling the output of such models. We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement over the course of the generated video. Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.

Summary

AI-Generated Summary

PDF121December 15, 2024