ChatPaper.aiChatPaper

LVSM:具有最小3D归纳偏差的大视图合成模型

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

October 22, 2024
作者: Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, Zexiang Xu
cs.AI

摘要

我们提出了大视角综合模型(LVSM),这是一种基于Transformer的创新方法,用于从稀疏视角输入中实现可扩展和通用的新视角合成。我们引入了两种架构:(1)编码器-解码器LVSM,将输入图像标记编码为固定数量的1D潜在标记,作为完全学习的场景表示,并从中解码新视角图像;以及(2)仅解码器LVSM,直接将输入图像映射到新视角输出,完全消除中间场景表示。这两种模型都绕过了先前方法中使用的3D归纳偏差,从3D表示(例如NeRF,3DGS)到网络设计(例如极线投影,平面扫描),以完全数据驱动的方式解决新视角合成问题。虽然编码器-解码器模型由于其独立的潜在表示而提供更快的推理,但仅解码器LVSM实现了更优质、可扩展和零样本泛化,优于先前的最先进方法1.5到3.5 dB的PSNR。在多个数据集上进行的全面评估表明,这两种LVSM变体均实现了最先进的新视角合成质量。值得注意的是,即使使用较少的计算资源(1-2个GPU),我们的模型也超越了所有先前的方法。更多详细信息请参阅我们的网站:https://haian-jin.github.io/projects/LVSM/。
English
We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens, functioning as a fully learned scene representation, and decodes novel-view images from them; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs, completely eliminating intermediate scene representations. Both models bypass the 3D inductive biases used in previous methods -- from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps) -- addressing novel view synthesis with a fully data-driven approach. While the encoder-decoder model offers faster inference due to its independent latent representation, the decoder-only LVSM achieves superior quality, scalability, and zero-shot generalization, outperforming previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. Comprehensive evaluations across multiple datasets demonstrate that both LVSM variants achieve state-of-the-art novel view synthesis quality. Notably, our models surpass all previous methods even with reduced computational resources (1-2 GPUs). Please see our website for more details: https://haian-jin.github.io/projects/LVSM/ .

Summary

AI-Generated Summary

PDF52November 16, 2024