ChatPaper.aiChatPaper

Flex3D:具有灵活重建模型和输入视图筛选的前馈式3D生成

Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

October 1, 2024
作者: Junlin Han, Jianyuan Wang, Andrea Vedaldi, Philip Torr, Filippos Kokkinos
cs.AI

摘要

从文本、单个图像或稀疏视图图像生成高质量的3D内容仍然是一个具有广泛应用的具有挑战性的任务。现有方法通常采用多视图扩散模型来合成多视图图像,然后通过前馈过程进行3D重建。然而,这些方法通常受制于少量且固定的输入视图,限制了其捕获多样观点的能力,更糟糕的是,如果合成的视图质量较差,会导致生成结果次优。为了解决这些限制,我们提出了Flex3D,这是一个新颖的两阶段框架,能够利用任意数量的高质量输入视图。第一阶段包括候选视图生成和筛选流程。我们采用经过微调的多视图图像扩散模型和视频扩散模型来生成候选视图池,实现对目标3D对象的丰富表示。随后,视图选择流程根据质量和一致性筛选这些视图,确保只有高质量和可靠的视图用于重建。在第二阶段,经过筛选的视图被馈送到一个灵活重建模型(FlexRM),该模型建立在可以有效处理任意数量输入的变压器架构之上。FlexRM直接输出3D高斯点,利用三平面表示,实现高效且详细的3D生成。通过对设计和训练策略的广泛探索,我们优化了FlexRM,使其在重建和生成任务中均表现出卓越性能。我们的结果表明,与几种最新的前馈3D生成模型相比,Flex3D在3D生成任务中取得了最先进的性能,用户研究获胜率超过92%。
English
Generating high-quality 3D content from text, single images, or sparse view images remains a challenging task with broad applications.Existing methods typically employ multi-view diffusion models to synthesize multi-view images, followed by a feed-forward process for 3D reconstruction. However, these approaches are often constrained by a small and fixed number of input views, limiting their ability to capture diverse viewpoints and, even worse, leading to suboptimal generation results if the synthesized views are of poor quality. To address these limitations, we propose Flex3D, a novel two-stage framework capable of leveraging an arbitrary number of high-quality input views. The first stage consists of a candidate view generation and curation pipeline. We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object. Subsequently, a view selection pipeline filters these views based on quality and consistency, ensuring that only the high-quality and reliable views are used for reconstruction. In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs. FlemRM directly outputs 3D Gaussian points leveraging a tri-plane representation, enabling efficient and detailed 3D generation. Through extensive exploration of design and training strategies, we optimize FlexRM to achieve superior performance in both reconstruction and generation tasks. Our results demonstrate that Flex3D achieves state-of-the-art performance, with a user study winning rate of over 92% in 3D generation tasks when compared to several of the latest feed-forward 3D generative models.

Summary

AI-Generated Summary

PDF205November 13, 2024