交互式神经视频编辑
INVE: Interactive Neural Video Editing
July 15, 2023
作者: Jiahui Huang, Leonid Sigal, Kwang Moo Yi, Oliver Wang, Joon-Young Lee
cs.AI
摘要
我们提出了交互式神经视频编辑(INVE),这是一个实时视频编辑解决方案,可以通过将稀疏帧编辑一致地传播到整个视频剪辑来辅助视频编辑过程。我们的方法受到最近关于分层神经图谱(LNA)的工作的启发。然而,LNA存在两个主要缺点:(1)该方法对于交互式编辑来说速度太慢,(2)它对一些编辑用例提供的支持不足,包括直接帧编辑和刚性纹理跟踪。为了解决这些挑战,我们利用和采用了高效的网络架构,采用哈希格编码技术,大幅提高了处理速度。此外,我们学习了图谱和图像之间的双向函数,并引入了矢量化编辑,共同实现了在图谱和帧直接进行更多种类的编辑。与LNA相比,我们的INVE将学习和推断时间缩短了5倍,并支持LNA无法实现的各种视频编辑操作。我们通过全面的定量和定性分析展示了INVE在交互式视频编辑中优于LNA的优越性,突出了其众多优势和改进的性能。有关视频结果,请访问 https://gabriel-huang.github.io/inve/
English
We present Interactive Neural Video Editing (INVE), a real-time video editing
solution, which can assist the video editing process by consistently
propagating sparse frame edits to the entire video clip. Our method is inspired
by the recent work on Layered Neural Atlas (LNA). LNA, however, suffers from
two major drawbacks: (1) the method is too slow for interactive editing, and
(2) it offers insufficient support for some editing use cases, including direct
frame editing and rigid texture tracking. To address these challenges we
leverage and adopt highly efficient network architectures, powered by
hash-grids encoding, to substantially improve processing speed. In addition, we
learn bi-directional functions between image-atlas and introduce vectorized
editing, which collectively enables a much greater variety of edits in both the
atlas and the frames directly. Compared to LNA, our INVE reduces the learning
and inference time by a factor of 5, and supports various video editing
operations that LNA cannot. We showcase the superiority of INVE over LNA in
interactive video editing through a comprehensive quantitative and qualitative
analysis, highlighting its numerous advantages and improved performance. For
video results, please see https://gabriel-huang.github.io/inve/