Ultra3D: Efficiënte en hoogwaardige 3D-generatie met aandacht voor onderdelen

Samenvatting

Recente vooruitgang in sparse voxelrepresentaties heeft de kwaliteit van 3D-contentgeneratie aanzienlijk verbeterd, waardoor hoogwaardige modellering met fijnmazige geometrie mogelijk is geworden. Bestaande frameworks lijden echter onder ernstige computationele inefficiënties vanwege de kwadratische complexiteit van aandachtmechanismen in hun tweestaps diffusiepijplijnen. In dit werk presenteren we Ultra3D, een efficiënt 3D-generatieframework dat sparse voxelmodellering aanzienlijk versnelt zonder in te leveren op kwaliteit. Onze methode maakt gebruik van de compacte VecSet-representatie om in de eerste fase efficiënt een grove objectlay-out te genereren, waardoor het aantal tokens wordt verminderd en de voorspelling van voxelcoördinaten wordt versneld. Om per-voxel latente kenmerken in de tweede fase te verfijnen, introduceren we Part Attention, een geometrie-bewust lokaal aandachtmechanisme dat de aandachtberekening beperkt tot semantisch consistente deelregio's. Dit ontwerp behoudt structurele continuïteit terwijl onnodige globale aandacht wordt vermeden, wat resulteert in een versnelling van de latente generatie tot wel 6,7x. Om dit mechanisme te ondersteunen, bouwen we een schaalbare pijplijn voor deelannotatie die ruwe meshes omzet in sparse voxels met deel-labels. Uitgebreide experimenten tonen aan dat Ultra3D hoogwaardige 3D-generatie ondersteunt bij een resolutie van 1024 en state-of-the-art prestaties behaalt op het gebied van visuele kwaliteit en gebruikersvoorkeur.

English

Recent advances in sparse voxel representations have significantly improved the quality of 3D content generation, enabling high-resolution modeling with fine-grained geometry. However, existing frameworks suffer from severe computational inefficiencies due to the quadratic complexity of attention mechanisms in their two-stage diffusion pipelines. In this work, we propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality. Our method leverages the compact VecSet representation to efficiently generate a coarse object layout in the first stage, reducing token count and accelerating voxel coordinate prediction. To refine per-voxel latent features in the second stage, we introduce Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7x speed-up in latent generation. To support this mechanism, we construct a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels. Extensive experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.

Ultra3D: Efficiënte en hoogwaardige 3D-generatie met aandacht voor onderdelen

Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Samenvatting

Support