ReCapture:使用遮罩视频微调的生成式视频相机控制用户提供的视频
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
November 7, 2024
作者: David Junhao Zhang, Roni Paiss, Shiran Zada, Nikhil Karnad, David E. Jacobs, Yael Pritch, Inbar Mosseri, Mike Zheng Shou, Neal Wadhwa, Nataniel Ruiz
cs.AI
摘要
最近,视频建模方面取得了突破,使得在生成的视频中可以实现可控的摄像机轨迹。然而,这些方法无法直接应用于用户提供的非由视频模型生成的视频。本文提出了一种名为ReCapture的方法,用于从单个用户提供的视频中生成具有新颜色轨迹的新视频。我们的方法允许我们重新生成参考视频,保留其所有现有的场景运动,从完全不同的角度以及具有电影般的摄像机运动。值得注意的是,使用我们的方法,我们还可以合理地虚构在参考视频中无法观察到的场景部分。我们的方法通过以下步骤实现:(1) 使用多视角扩散模型或基于深度的点云渲染生成具有新摄像机轨迹的嘈杂锚定视频,然后(2) 利用我们提出的遮罩视频微调技术将锚定视频重新生成为干净且时间上一致的重新角度视频。
English
Recently, breakthroughs in video modeling have allowed for controllable
camera trajectories in generated videos. However, these methods cannot be
directly applied to user-provided videos that are not generated by a video
model. In this paper, we present ReCapture, a method for generating new videos
with novel camera trajectories from a single user-provided video. Our method
allows us to re-generate the reference video, with all its existing scene
motion, from vastly different angles and with cinematic camera motion. Notably,
using our method we can also plausibly hallucinate parts of the scene that were
not observable in the reference video. Our method works by (1) generating a
noisy anchor video with a new camera trajectory using multiview diffusion
models or depth-based point cloud rendering and then (2) regenerating the
anchor video into a clean and temporally consistent reangled video using our
proposed masked video fine-tuning technique.