LATTICE:大规模实现高保真3D生成的民主化
LATTICE: Democratize High-Fidelity 3D Generation at Scale
November 24, 2025
作者: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Qingxiang Lin, Jingwei Huang, Chunchao Guo, Xiangyu Yue
cs.AI
摘要
我们提出LATTICE——一个连接3D与2D生成模型质量与可扩展性鸿沟的高保真3D资产生成新框架。尽管2D图像合成受益于固定空间网格和完善的Transformer架构,但3D生成由于需要从零预测空间结构与精细几何表面而始终面临更根本性的挑战。现有3D表示方式的计算复杂性以及缺乏结构化、可扩展的3D资产编码方案,进一步加剧了这些挑战。为此,我们提出VoxSet这种半结构化表示法,它将3D资产压缩至锚定于粗粒度体素网格的紧凑隐向量集合,实现高效且位置感知的生成。VoxSet在保留先前VecSet方法简洁性与压缩优势的同时,为隐空间引入显式结构,使位置嵌入能引导生成过程并实现强大的令牌级测试时缩放。基于此表示法构建的LATTICE采用双阶段流程:首先生成稀疏体素化几何锚点,继而通过修正流Transformer生成精细几何。我们的方法核心简洁,但支持任意分辨率解码、低成本训练及灵活推理方案,在多项指标上达到最先进性能,为可扩展的高质量3D资产创建迈出重要一步。
English
We present LATTICE, a new framework for high-fidelity 3D asset generation that bridges the quality and scalability gap between 3D and 2D generative models. While 2D image synthesis benefits from fixed spatial grids and well-established transformer architectures, 3D generation remains fundamentally more challenging due to the need to predict both spatial structure and detailed geometric surfaces from scratch. These challenges are exacerbated by the computational complexity of existing 3D representations and the lack of structured and scalable 3D asset encoding schemes. To address this, we propose VoxSet, a semi-structured representation that compresses 3D assets into a compact set of latent vectors anchored to a coarse voxel grid, enabling efficient and position-aware generation. VoxSet retains the simplicity and compression advantages of prior VecSet methods while introducing explicit structure into the latent space, allowing positional embeddings to guide generation and enabling strong token-level test-time scaling. Built upon this representation, LATTICE adopts a two-stage pipeline: first generating a sparse voxelized geometry anchor, then producing detailed geometry using a rectified flow transformer. Our method is simple at its core, but supports arbitrary resolution decoding, low-cost training, and flexible inference schemes, achieving state-of-the-art performance on various aspects, and offering a significant step toward scalable, high-quality 3D asset creation.