PartCrafter：基于组合式潜在扩散Transformer的结构化三维网格生成

摘要

我们推出PartCrafter，这是首个结构化3D生成模型，能够从单一RGB图像中联合合成多个语义明确且几何形态各异的3D网格。与现有方法不同，这些方法要么生成单一整体的3D形状，要么采用两阶段流程——先分割图像再重建每个部分，PartCrafter采用了一种统一的、组合式的生成架构，无需依赖预先分割的输入。在单张图像的条件下，它同时去噪多个3D部件，实现了从个体对象到复杂多对象场景的端到端部件感知生成。PartCrafter基于预训练的3D网格扩散变换器（DiT）构建，该变换器针对完整对象进行训练，继承了预训练的权重、编码器和解码器，并引入了两项关键创新：（1）一个组合式潜在空间，其中每个3D部件由一组解耦的潜在令牌表示；（2）一种层次化注意力机制，该机制支持在单个部件内部及所有部件之间进行结构化信息流动，确保生成过程中的全局一致性同时保留部件级别的细节。为了支持部件级别的监督，我们通过从大规模3D对象数据集中挖掘部件级注释，精心策划了一个新数据集。实验表明，PartCrafter在生成可分解的3D网格方面超越了现有方法，包括在输入图像中不可直接观察到的部件，展示了部件感知生成先验在3D理解与合成中的强大能力。代码及训练数据将予以公开。

English

We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image. Unlike existing methods that either produce monolithic 3D shapes or follow two-stage pipelines, i.e., first segmenting an image and then reconstructing each segment, PartCrafter adopts a unified, compositional generation architecture that does not rely on pre-segmented inputs. Conditioned on a single image, it simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes. PartCrafter builds upon a pretrained 3D mesh diffusion transformer (DiT) trained on whole objects, inheriting the pretrained weights, encoder, and decoder, and introduces two key innovations: (1) A compositional latent space, where each 3D part is represented by a set of disentangled latent tokens; (2) A hierarchical attention mechanism that enables structured information flow both within individual parts and across all parts, ensuring global coherence while preserving part-level detail during generation. To support part-level supervision, we curate a new dataset by mining part-level annotations from large-scale 3D object datasets. Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes, including parts that are not directly visible in input images, demonstrating the strength of part-aware generative priors for 3D understanding and synthesis. Code and training data will be released.

PartCrafter：基于组合式潜在扩散Transformer的结构化三维网格生成

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

摘要

Support