Ultra3D：基于部件注意力的高效高保真三维生成

摘要

稀疏体素表示技术的最新进展显著提升了三维内容生成的质量，实现了具有精细几何结构的高分辨率建模。然而，现有框架因其两阶段扩散流程中注意力机制的二次方复杂度而面临严重的计算效率问题。在本研究中，我们提出了Ultra3D，一种高效的三维生成框架，它大幅加速了稀疏体素建模过程，同时不牺牲生成质量。我们的方法利用紧凑的VecSet表示，在第一阶段高效生成粗略物体布局，减少了令牌数量并加速了体素坐标预测。为了在第二阶段细化每个体素的潜在特征，我们引入了部件注意力（Part Attention），这是一种几何感知的局部注意力机制，它将注意力计算限制在语义一致的部件区域内。这一设计在保持结构连续性的同时，避免了不必要的全局注意力计算，在潜在特征生成上实现了高达6.7倍的加速。为了支持这一机制，我们构建了一个可扩展的部件标注流程，将原始网格转换为带有部件标签的稀疏体素。大量实验证明，Ultra3D支持1024分辨率的高分辨率三维生成，并在视觉保真度和用户偏好方面均达到了业界领先水平。

English

Recent advances in sparse voxel representations have significantly improved the quality of 3D content generation, enabling high-resolution modeling with fine-grained geometry. However, existing frameworks suffer from severe computational inefficiencies due to the quadratic complexity of attention mechanisms in their two-stage diffusion pipelines. In this work, we propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality. Our method leverages the compact VecSet representation to efficiently generate a coarse object layout in the first stage, reducing token count and accelerating voxel coordinate prediction. To refine per-voxel latent features in the second stage, we introduce Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7x speed-up in latent generation. To support this mechanism, we construct a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels. Extensive experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.

Ultra3D：基于部件注意力的高效高保真三维生成

Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

摘要

Support