互動式神經影片編輯
INVE: Interactive Neural Video Editing
July 15, 2023
作者: Jiahui Huang, Leonid Sigal, Kwang Moo Yi, Oliver Wang, Joon-Young Lee
cs.AI
摘要
我們提出了互動式神經視頻編輯(INVE),這是一個即時視頻編輯解決方案,可以通過將稀疏幀編輯一致地擴展到整個視頻剪輯,從而協助視頻編輯過程。我們的方法受到最近有關分層神經地圖(LNA)的工作的啟發。然而,LNA存在兩個主要缺點:(1)該方法對於互動式編輯來說速度太慢,(2)對於一些編輯用例,包括直接幀編輯和剛性紋理跟踪,提供的支持不足。為了應對這些挑戰,我們利用和採用高效的網絡架構,搭配哈希網格編碼,顯著提高處理速度。此外,我們學習了圖像地圖和引入了向量化編輯之間的雙向功能,這共同使得在地圖和幀直接上進行更多種類的編輯成為可能。與LNA相比,我們的INVE將學習和推理時間減少了5倍,並支持LNA無法實現的各種視頻編輯操作。我們通過全面的定量和定性分析展示了INVE在互動式視頻編輯中優於LNA的優越性,突出了其眾多優勢和改進的性能。有關視頻結果,請參見https://gabriel-huang.github.io/inve/
English
We present Interactive Neural Video Editing (INVE), a real-time video editing
solution, which can assist the video editing process by consistently
propagating sparse frame edits to the entire video clip. Our method is inspired
by the recent work on Layered Neural Atlas (LNA). LNA, however, suffers from
two major drawbacks: (1) the method is too slow for interactive editing, and
(2) it offers insufficient support for some editing use cases, including direct
frame editing and rigid texture tracking. To address these challenges we
leverage and adopt highly efficient network architectures, powered by
hash-grids encoding, to substantially improve processing speed. In addition, we
learn bi-directional functions between image-atlas and introduce vectorized
editing, which collectively enables a much greater variety of edits in both the
atlas and the frames directly. Compared to LNA, our INVE reduces the learning
and inference time by a factor of 5, and supports various video editing
operations that LNA cannot. We showcase the superiority of INVE over LNA in
interactive video editing through a comprehensive quantitative and qualitative
analysis, highlighting its numerous advantages and improved performance. For
video results, please see https://gabriel-huang.github.io/inve/