ChatPaper.aiChatPaper

几何图像扩散:基于图像表面表示的快速且数据高效的文本到3D生成

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

September 5, 2024
作者: Slava Elizarov, Ciara Rowles, Simon Donné
cs.AI

摘要

从文本描述生成高质量的三维物体仍是一个具有挑战性的问题,原因在于计算成本高、三维数据稀缺以及复杂的三维表示方式。我们提出了一种新颖的文本到三维模型——几何图像扩散(GIMDiffusion),该模型利用几何图像通过二维图像高效地表示三维形状,从而避免了复杂的三维感知架构需求。通过集成协作控制机制,我们充分利用了现有文本到图像模型(如稳定扩散)的丰富二维先验知识。这使得即使在有限的三维训练数据下(允许我们仅使用高质量的训练数据),也能实现强大的泛化能力,并保持与IPAdapter等引导技术的兼容性。简而言之,GIMDiffusion能够以与当前文本到图像模型相当的速度生成三维资产。生成的对象由语义明确、独立的部分组成,并包含内部结构,从而提升了实用性和多功能性。
English
Generating high-quality 3D objects from textual descriptions remains a challenging problem due to computational cost, the scarcity of 3D data, and complex 3D representations. We introduce Geometry Image Diffusion (GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images, thereby avoiding the need for complex 3D-aware architectures. By integrating a Collaborative Control mechanism, we exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion. This enables strong generalization even with limited 3D training data (allowing us to use only high-quality training data) as well as retaining compatibility with guidance techniques such as IPAdapter. In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models. The generated objects consist of semantically meaningful, separate parts and include internal structures, enhancing both usability and versatility.
PDF273November 14, 2024