無需訓練的攝影機控制用於影片生成

摘要

我們提出了一種無需訓練且穩健的解決方案，為現成的影片擴散模型提供攝影機移動控制。與先前的工作不同，我們的方法不需要在攝影機標註數據集上進行監督微調或通過數據擴增進行自監督訓練。相反，它可以與大多數預訓練的影片擴散模型相容，並且能夠通過單張圖像或文本提示生成可控攝影機的影片。我們的工作靈感來自中間潛變量對生成結果的佈局先驗，因此重新排列其中的噪點像素將使輸出內容重新分配。由於攝影機移動也可以被視為由於透視變化而導致的像素重新排列，如果它們的噪聲潛變量相應地改變，則影片可以按照特定攝影機運動重新組織。基於此，我們提出了我們的方法CamTrol，它實現了對影片擴散模型的穩健攝影機控制。這是通過兩階段過程實現的。首先，我們通過三維點雲空間中的明確攝影機移動來建模圖像佈局重新排列。其次，我們使用一系列重新排列的圖像形成的噪聲潛變量的佈局先驗來生成帶有攝影機運動的影片。大量實驗證明了我們的方法在控制生成影片的攝影機運動方面的穩健性。此外，我們展示了我們的方法在生成具有動態內容的三維旋轉影片方面能夠產生令人印象深刻的結果。項目頁面位於https://lifedecoder.github.io/CamTrol/。

English

We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Unlike previous work, our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation. Instead, it can be plugged and played with most pretrained video diffusion models and generate camera controllable videos with a single image or text prompt as input. The inspiration of our work comes from the layout prior that intermediate latents hold towards generated results, thus rearranging noisy pixels in them will make output content reallocated as well. As camera move could also be seen as a kind of pixel rearrangement caused by perspective change, videos could be reorganized following specific camera motion if their noisy latents change accordingly. Established on this, we propose our method CamTrol, which enables robust camera control for video diffusion models. It is achieved by a two-stage process. First, we model image layout rearrangement through explicit camera movement in 3D point cloud space. Second, we generate videos with camera motion using layout prior of noisy latents formed by a series of rearranged images. Extensive experiments have demonstrated the robustness our method holds in controlling camera motion of generated videos. Furthermore, we show that our method can produce impressive results in generating 3D rotation videos with dynamic content. Project page at https://lifedecoder.github.io/CamTrol/.

無需訓練的攝影機控制用於影片生成

Training-free Camera Control for Video Generation

摘要

Support