PartCrafter：基於組合式潛在擴散變換器的結構化3D網格生成

摘要

我們推出了PartCrafter，這是首個結構化的3D生成模型，能夠從單張RGB圖像中聯合合成多個語義明確且幾何特徵各異的3D網格。與現有方法不同，這些方法要么生成單一的3D形狀，要么遵循兩階段流程（即先分割圖像再重建每個部分），PartCrafter採用了一種統一的、組合式的生成架構，無需依賴預分割的輸入。基於單張圖像，它能夠同時對多個3D部件進行去噪，實現從端到端的部件感知生成，無論是單個物體還是複雜的多物體場景。PartCrafter建立在一個預訓練的3D網格擴散變換器（DiT）基礎上，該變換器針對完整物體進行訓練，繼承了預訓練的權重、編碼器和解碼器，並引入了兩項關鍵創新：（1）一個組合式的潛在空間，其中每個3D部件由一組解耦的潛在標記表示；（2）一種分層注意力機制，該機制支持在單個部件內部以及所有部件之間進行結構化信息流動，確保在生成過程中保持全局一致性的同時，保留部件層次的細節。為了支持部件級別的監督，我們通過從大規模3D物體數據集中挖掘部件級別註釋，策劃了一個新的數據集。實驗表明，PartCrafter在生成可分解的3D網格方面優於現有方法，包括在輸入圖像中不可直接見到的部件，展示了部件感知生成先驗在3D理解與合成中的強大能力。代碼和訓練數據將被公開。

English

We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image. Unlike existing methods that either produce monolithic 3D shapes or follow two-stage pipelines, i.e., first segmenting an image and then reconstructing each segment, PartCrafter adopts a unified, compositional generation architecture that does not rely on pre-segmented inputs. Conditioned on a single image, it simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes. PartCrafter builds upon a pretrained 3D mesh diffusion transformer (DiT) trained on whole objects, inheriting the pretrained weights, encoder, and decoder, and introduces two key innovations: (1) A compositional latent space, where each 3D part is represented by a set of disentangled latent tokens; (2) A hierarchical attention mechanism that enables structured information flow both within individual parts and across all parts, ensuring global coherence while preserving part-level detail during generation. To support part-level supervision, we curate a new dataset by mining part-level annotations from large-scale 3D object datasets. Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes, including parts that are not directly visible in input images, demonstrating the strength of part-aware generative priors for 3D understanding and synthesis. Code and training data will be released.

PartCrafter：基於組合式潛在擴散變換器的結構化3D網格生成

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

摘要

Support