CoDeF: 시간적 일관성을 갖춘 비디오 처리를 위한 콘텐츠 변형 필드

초록

우리는 새로운 형태의 비디오 표현 방식으로서 콘텐츠 변형 필드(CoDeF)를 제안합니다. 이는 전체 비디오에서 정적 콘텐츠를 집계하는 정규 콘텐츠 필드와, 정규 이미지(즉, 정규 콘텐츠 필드에서 렌더링된 이미지)로부터 각 개별 프레임까지의 변형을 기록하는 시간적 변형 필드로 구성됩니다. 주어진 타겟 비디오에 대해, 이 두 필드는 신중하게 설계된 렌더링 파이프라인을 통해 비디오를 재구성하도록 공동으로 최적화됩니다. 우리는 최적화 과정에 몇 가지 정규화를 도입하여, 정규 콘텐츠 필드가 비디오로부터 의미론적 정보(예: 객체 형태)를 상속받도록 유도합니다. 이러한 설계로 인해, CoDeF는 이미지 알고리즘을 비디오 처리에 자연스럽게 확장할 수 있게 합니다. 즉, 이미지 알고리즘을 정규 이미지에 적용하고, 시간적 변형 필드의 도움으로 그 결과를 전체 비디오에 손쉽게 전파할 수 있습니다. 우리는 실험을 통해 CoDeF가 이미지-이미지 변환을 비디오-비디오 변환으로, 키포인트 검출을 키포인트 추적으로 별도의 학습 없이 확장할 수 있음을 보여줍니다. 더 중요한 것은, 우리의 확장 전략이 단일 이미지에만 알고리즘을 적용함으로써, 기존의 비디오-비디오 변환 접근법에 비해 처리된 비디오에서 뛰어난 프레임 간 일관성을 달성하고, 물이나 연기와 같은 비강체 객체까지 추적할 수 있다는 점입니다. 프로젝트 페이지는 https://qiuyu96.github.io/CoDeF/에서 확인할 수 있습니다.

English

We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.Project page can be found at https://qiuyu96.github.io/CoDeF/.

CoDeF: 시간적 일관성을 갖춘 비디오 처리를 위한 콘텐츠 변형 필드

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

초록

Support