Step1X-3D:迈向高保真与可控的纹理化3D资产生成
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
May 12, 2025
作者: Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, Xiao Chen, Feipeng Tian, Jianxiong Pan, Zeming Li, Gang Yu, Xiangyu Zhang, Daxin Jiang, Ping Tan
cs.AI
摘要
尽管生成式人工智能在文本、图像、音频和视频领域取得了显著进展,但三维生成技术由于数据稀缺、算法限制及生态系统碎片化等根本性挑战,相对发展滞后。为此,我们推出了Step1X-3D,一个旨在解决这些挑战的开放框架,其特点包括:(1) 严格的数据筛选流程,处理超过500万项资产,构建了一个包含200万高质量数据集的标准化几何与纹理属性;(2) 两阶段的三维原生架构,结合了混合VAE-DiT几何生成器与基于扩散的纹理合成模块;(3) 全面开源发布模型、训练代码及适配模块。在几何生成方面,混合VAE-DiT组件通过感知器基础的潜在编码及锐利边缘采样,生成TSDF表示,以保留细节。基于扩散的纹理合成模块则通过几何条件与潜在空间同步,确保跨视角一致性。基准测试结果显示,该框架性能超越现有开源方法,达到与专有解决方案相媲美的质量。尤为突出的是,Step1X-3D框架独特地连接了二维与三维生成范式,支持将二维控制技术(如LoRA)直接迁移至三维合成。通过同步提升数据质量、算法保真度及可复现性,Step1X-3D旨在为可控三维资产生成的开放研究设立新标准。
English
While generative artificial intelligence has advanced significantly across
text, image, audio, and video domains, 3D generation remains comparatively
underdeveloped due to fundamental challenges such as data scarcity, algorithmic
limitations, and ecosystem fragmentation. To this end, we present Step1X-3D, an
open framework addressing these challenges through: (1) a rigorous data
curation pipeline processing >5M assets to create a 2M high-quality dataset
with standardized geometric and textural properties; (2) a two-stage 3D-native
architecture combining a hybrid VAE-DiT geometry generator with an
diffusion-based texture synthesis module; and (3) the full open-source release
of models, training code, and adaptation modules. For geometry generation, the
hybrid VAE-DiT component produces TSDF representations by employing
perceiver-based latent encoding with sharp edge sampling for detail
preservation. The diffusion-based texture synthesis module then ensures
cross-view consistency through geometric conditioning and latent-space
synchronization. Benchmark results demonstrate state-of-the-art performance
that exceeds existing open-source methods, while also achieving competitive
quality with proprietary solutions. Notably, the framework uniquely bridges the
2D and 3D generation paradigms by supporting direct transfer of 2D control
techniques~(e.g., LoRA) to 3D synthesis. By simultaneously advancing data
quality, algorithmic fidelity, and reproducibility, Step1X-3D aims to establish
new standards for open research in controllable 3D asset generation.Summary
AI-Generated Summary