Ultra3D:基于部件注意力的高效高保真三维生成
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention
July 23, 2025
作者: Yiwen Chen, Zhihao Li, Yikai Wang, Hu Zhang, Qin Li, Chi Zhang, Guosheng Lin
cs.AI
摘要
稀疏体素表示技术的最新进展显著提升了三维内容生成的质量,实现了具有精细几何结构的高分辨率建模。然而,现有框架因其两阶段扩散流程中注意力机制的二次方复杂度而面临严重的计算效率问题。在本研究中,我们提出了Ultra3D,一种高效的三维生成框架,它大幅加速了稀疏体素建模过程,同时不牺牲生成质量。我们的方法利用紧凑的VecSet表示,在第一阶段高效生成粗略物体布局,减少了令牌数量并加速了体素坐标预测。为了在第二阶段细化每个体素的潜在特征,我们引入了部件注意力(Part Attention),这是一种几何感知的局部注意力机制,它将注意力计算限制在语义一致的部件区域内。这一设计在保持结构连续性的同时,避免了不必要的全局注意力计算,在潜在特征生成上实现了高达6.7倍的加速。为了支持这一机制,我们构建了一个可扩展的部件标注流程,将原始网格转换为带有部件标签的稀疏体素。大量实验证明,Ultra3D支持1024分辨率的高分辨率三维生成,并在视觉保真度和用户偏好方面均达到了业界领先水平。
English
Recent advances in sparse voxel representations have significantly improved
the quality of 3D content generation, enabling high-resolution modeling with
fine-grained geometry. However, existing frameworks suffer from severe
computational inefficiencies due to the quadratic complexity of attention
mechanisms in their two-stage diffusion pipelines. In this work, we propose
Ultra3D, an efficient 3D generation framework that significantly accelerates
sparse voxel modeling without compromising quality. Our method leverages the
compact VecSet representation to efficiently generate a coarse object layout in
the first stage, reducing token count and accelerating voxel coordinate
prediction. To refine per-voxel latent features in the second stage, we
introduce Part Attention, a geometry-aware localized attention mechanism that
restricts attention computation within semantically consistent part regions.
This design preserves structural continuity while avoiding unnecessary global
attention, achieving up to 6.7x speed-up in latent generation. To support this
mechanism, we construct a scalable part annotation pipeline that converts raw
meshes into part-labeled sparse voxels. Extensive experiments demonstrate that
Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves
state-of-the-art performance in both visual fidelity and user preference.