ChatPaper.aiChatPaper

幾何圖像擴散:基於圖像表面表徵的快速且數據高效的文本到3D生成

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

September 5, 2024
作者: Slava Elizarov, Ciara Rowles, Simon Donné
cs.AI

摘要

基於文本描述生成高品質3D物件仍是個具挑戰性的難題,其原因涉及計算成本高昂、3D資料稀缺以及複雜的3D表示方法。我們提出幾何圖像擴散模型(GIMDiffusion),這是一種創新的文本轉3D模型,能利用幾何圖像將3D形狀以二維影像形式高效表徵,從而避免使用複雜的3D感知架構。透過整合協同控制機制,我們充分發揮現有文本轉圖像模型(如Stable Diffusion)豐富的二維先驗知識,即便在3D訓練資料有限的情況下(使我們能僅採用高品質訓練資料)仍實現強大的泛化能力,並保持與IPAdapter等引導技術的兼容性。簡言之,GIMDiffusion能以媲美現行文本轉圖像模型的速度生成3D資產。所生成的物件不僅包含語意明確的獨立部件,更具備內部結構設計,顯著提升實用性與多功能性。
English
Generating high-quality 3D objects from textual descriptions remains a challenging problem due to computational cost, the scarcity of 3D data, and complex 3D representations. We introduce Geometry Image Diffusion (GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images, thereby avoiding the need for complex 3D-aware architectures. By integrating a Collaborative Control mechanism, we exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion. This enables strong generalization even with limited 3D training data (allowing us to use only high-quality training data) as well as retaining compatibility with guidance techniques such as IPAdapter. In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models. The generated objects consist of semantically meaningful, separate parts and include internal structures, enhancing both usability and versatility.
PDF273November 14, 2024