非正式视频的快速视图合成
Fast View Synthesis of Casual Videos
December 4, 2023
作者: Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, Feng Liu
cs.AI
摘要
从野外视频中合成新视角是困难的,因为存在场景动态和缺乏视差等挑战。虽然现有方法利用隐式神经辐射场显示了有希望的结果,但它们训练和渲染速度较慢。本文重新审视显式视频表示,以高效合成单目视频的高质量新视角。我们将静态和动态视频内容分开处理。具体而言,我们使用扩展的基于平面的场景表示构建全局静态场景模型,以合成具有时间连贯性的新视频。我们的基于平面的场景表示通过球谐函数和位移贴图进行增强,以捕捉视角相关效应并对非平面复杂表面几何进行建模。我们选择将动态内容表示为逐帧点云以提高效率。虽然这种表示容易出现不一致性,但由于运动,轻微的时间不一致性在感知上被掩盖。我们开发了一种快速估算这种混合视频表示并实时渲染新视角的方法。我们的实验表明,我们的方法可以从野外视频中渲染高质量的新视角,质量可与最先进的方法相媲美,同时训练速度快100倍,并实现实时渲染。
English
Novel view synthesis from an in-the-wild video is difficult due to challenges
like scene dynamics and lack of parallax. While existing methods have shown
promising results with implicit neural radiance fields, they are slow to train
and render. This paper revisits explicit video representations to synthesize
high-quality novel views from a monocular video efficiently. We treat static
and dynamic video content separately. Specifically, we build a global static
scene model using an extended plane-based scene representation to synthesize
temporally coherent novel video. Our plane-based scene representation is
augmented with spherical harmonics and displacement maps to capture
view-dependent effects and model non-planar complex surface geometry. We opt to
represent the dynamic content as per-frame point clouds for efficiency. While
such representations are inconsistency-prone, minor temporal inconsistencies
are perceptually masked due to motion. We develop a method to quickly estimate
such a hybrid video representation and render novel views in real time. Our
experiments show that our method can render high-quality novel views from an
in-the-wild video with comparable quality to state-of-the-art methods while
being 100x faster in training and enabling real-time rendering.