協作式影片傳播：具有攝影機控制的一致多影片生成

摘要

最近在影片生成方面取得了巨大進展，使得可以從文字提示或圖像生成高質量的影片。為影片生成過程添加控制是未來的重要目標，最近的方法將影片生成模型條件化於攝影機軌跡上取得了進展。然而，從多個不同攝影機軌跡生成同一場景的影片仍然具有挑戰性。解決這個多影片生成問題可以實現大規模的3D場景生成，包括可編輯攝影機軌跡等應用。我們引入協同影片擴散（CVD）作為實現這一願景的重要一步。CVD框架包括一個新穎的跨影片同步模塊，通過楔形關注機制促進從不同攝影機姿勢渲染的同一影片對應幀之間的一致性。在基於最先進的攝影機控制模塊進行影片生成的基礎上進行訓練，CVD生成了從不同攝影機軌跡渲染的多個影片，其一致性顯著優於基準，這在廣泛的實驗中得到了證明。項目頁面：https://collaborativevideodiffusion.github.io/。

English

Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene from multiple different camera trajectories. Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. The CVD framework includes a novel cross-video synchronization module that promotes consistency between corresponding frames of the same video rendered from different camera poses using an epipolar attention mechanism. Trained on top of a state-of-the-art camera-control module for video generation, CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines, as shown in extensive experiments. Project page: https://collaborativevideodiffusion.github.io/.

協作式影片傳播：具有攝影機控制的一致多影片生成

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

摘要

Support