HexaGen3D：StableDiffusion 距离快速且多样化的文本到三维生成仅一步之遥。

摘要

尽管生成建模取得了最新的显著进展，但从文本提示有效生成高质量的3D资产仍然是一项困难的任务。一个关键挑战在于数据稀缺：最广泛的3D数据集仅包含数百万个资产，而它们的2D对应物包含数十亿个文本-图像对。为了解决这个问题，我们提出了一种新颖的方法，利用大型预训练的2D扩散模型的强大能力。更具体地说，我们的方法HexaGen3D对预训练的文本到图像模型进行微调，共同预测6个正交投影和相应的潜在三视图。然后，我们解码这些潜在因子以生成带纹理的网格。HexaGen3D不需要每个样本的优化，可以在7秒内从文本提示中推断出高质量且多样化的对象，相比现有方法，提供了更好的质量与延迟之间的折衷。此外，HexaGen3D展示了对新对象或组合的强大泛化能力。

English

Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power of large, pretrained 2D diffusion models. More specifically, our approach, HexaGen3D, fine-tunes a pretrained text-to-image model to jointly predict 6 orthographic projections and the corresponding latent triplane. We then decode these latents to generate a textured mesh. HexaGen3D does not require per-sample optimization, and can infer high-quality and diverse objects from textual prompts in 7 seconds, offering significantly better quality-to-latency trade-offs when comparing to existing approaches. Furthermore, HexaGen3D demonstrates strong generalization to new objects or compositions.

HexaGen3D：StableDiffusion 距离快速且多样化的文本到三维生成仅一步之遥。

HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

摘要

Support