CoDeF：用于时间一致视频处理的内容变形场

摘要

我们提出内容变形场（CoDeF）作为一种新型视频表示形式，它包括一个规范内容场，汇总整个视频中的静态内容，以及一个时间变形场，记录从规范图像（即从规范内容场渲染而成的图像）到沿时间轴上每个单独帧的变换。针对目标视频，这两个场是联合优化的，通过精心设计的渲染流程来重建视频。我们有意在优化过程中引入一些正则化，促使规范内容场从视频中继承语义（例如对象形状）。通过这种设计，CoDeF 自然地支持将图像算法用于视频处理，即可以将图像算法应用于规范图像，然后借助时间变形场轻松地将结果传播到整个视频。我们通过实验证明，CoDeF 能够将图像到图像的转换提升到视频到视频的转换，并将关键点检测提升到关键点跟踪，而无需任何训练。更重要的是，由于我们的提升策略仅在一个图像上部署算法，与现有的视频到视频转换方法相比，我们在处理视频时实现了更优越的跨帧一致性，甚至成功跟踪非刚性物体，如水和烟雾。项目页面可在 https://qiuyu96.github.io/CoDeF/ 找到。

English

We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.Project page can be found at https://qiuyu96.github.io/CoDeF/.

CoDeF：用于时间一致视频处理的内容变形场

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

摘要

Support