ChatPaper.aiChatPaper

MVDiffusion++:用於單視角或稀疏視角3D物體重建的密集高解析度多視角擴散模型

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

February 20, 2024
作者: Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan
cs.AI

摘要

本文介紹了一種名為MVDiffusion++的神經架構,用於3D物體重建,該架構可以在沒有相機姿態信息的情況下,基於一個或少數幾個圖像合成物體的密集且高分辨率視圖。MVDiffusion++通過兩個驚人簡單的想法實現了卓越的靈活性和可擴展性:1)一種“無姿態架構”,其中2D潛在特徵之間的標準自注意力學習了跨任意數量條件和生成視圖的3D一致性,而無需明確使用相機姿態信息;以及2)一種“視圖丟棄策略”,該策略在訓練期間丟棄大量輸出視圖,從而降低了訓練時的內存佔用,並使得在測試時能夠進行密集且高分辨率的視圖合成。我們使用Objaverse進行訓練,並使用Google掃描對象進行評估,並使用標準新視圖合成和3D重建指標,其中MVDiffusion++明顯優於當前的技術水平。我們還通過將MVDiffusion++與文本到圖像生成模型相結合,展示了一個文本到3D應用示例。
English
This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model.

Summary

AI-Generated Summary

PDF184December 15, 2024