INVE: 인터랙티브 신경망 비디오 편집

초록

우리는 실시간 비디오 편집 솔루션인 Interactive Neural Video Editing(INVE)을 소개합니다. 이 솔루션은 희소 프레임 편집을 전체 비디오 클립에 일관되게 전파함으로써 비디오 편집 프로세스를 지원할 수 있습니다. 우리의 방법은 최근의 Layered Neural Atlas(LNA) 연구에서 영감을 받았습니다. 그러나 LNA는 두 가지 주요 단점을 가지고 있습니다: (1) 이 방법은 인터랙티브 편집에 너무 느리며, (2) 직접 프레임 편집 및 강체 텍스처 추적과 같은 일부 편집 사용 사례에 대한 지원이 부족합니다. 이러한 문제를 해결하기 위해 우리는 해시 그리드 인코딩으로 구동되는 고효율 네트워크 아키텍처를 활용 및 채택하여 처리 속도를 크게 개선했습니다. 또한, 이미지-아틀라스 간의 양방향 함수를 학습하고 벡터화된 편집을 도입함으로써 아틀라스와 프레임 모두에서 훨씬 더 다양한 편집을 가능하게 했습니다. LNA와 비교하여, 우리의 INVE는 학습 및 추론 시간을 5배 단축했으며, LNA가 지원하지 못하는 다양한 비디오 편집 작업을 지원합니다. 우리는 포괄적인 정량적 및 정성적 분석을 통해 인터랙티브 비디오 편집에서 INVE가 LNA보다 우수함을 보여주며, 그 수많은 장점과 향상된 성능을 강조합니다. 비디오 결과는 https://gabriel-huang.github.io/inve/에서 확인할 수 있습니다.

English

We present Interactive Neural Video Editing (INVE), a real-time video editing solution, which can assist the video editing process by consistently propagating sparse frame edits to the entire video clip. Our method is inspired by the recent work on Layered Neural Atlas (LNA). LNA, however, suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases, including direct frame editing and rigid texture tracking. To address these challenges we leverage and adopt highly efficient network architectures, powered by hash-grids encoding, to substantially improve processing speed. In addition, we learn bi-directional functions between image-atlas and introduce vectorized editing, which collectively enables a much greater variety of edits in both the atlas and the frames directly. Compared to LNA, our INVE reduces the learning and inference time by a factor of 5, and supports various video editing operations that LNA cannot. We showcase the superiority of INVE over LNA in interactive video editing through a comprehensive quantitative and qualitative analysis, highlighting its numerous advantages and improved performance. For video results, please see https://gabriel-huang.github.io/inve/

INVE: 인터랙티브 신경망 비디오 편집

INVE: Interactive Neural Video Editing

초록

Support