Step1X-3D:迈向高保真与可控的纹理化三维资产生成
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
May 12, 2025
作者: Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, Xiao Chen, Feipeng Tian, Jianxiong Pan, Zeming Li, Gang Yu, Xiangyu Zhang, Daxin Jiang, Ping Tan
cs.AI
摘要
儘管生成式人工智慧在文本、圖像、音頻和視頻領域已取得顯著進展,三維生成技術卻因數據稀缺、算法限制及生態系統碎片化等根本性挑戰而相對落後。為此,我們提出了Step1X-3D,這是一個開放框架,旨在通過以下方式應對這些挑戰:(1) 建立嚴格的數據篩選流程,處理超過500萬個資產,創建一個包含200萬高質量數據集,具備標準化幾何與紋理屬性;(2) 採用兩階段的三維原生架構,結合混合VAE-DiT幾何生成器與基於擴散的紋理合成模塊;(3) 全面開源模型、訓練代碼及適配模塊。在幾何生成方面,混合VAE-DiT組件通過感知器基礎的潛在編碼與銳利邊緣採樣,生成TSDF表示以保留細節。隨後,基於擴散的紋理合成模塊通過幾何條件與潛在空間同步,確保跨視圖一致性。基準測試結果顯示,該框架性能超越現有開源方法,達到業界領先水平,並與專有解決方案競爭力相當。值得注意的是,該框架獨特地連接了二維與三維生成範式,支持將二維控制技術(如LoRA)直接轉移至三維合成。通過同步提升數據質量、算法保真度與可重現性,Step1X-3D旨在為可控三維資產生成的開放研究設立新標準。
English
While generative artificial intelligence has advanced significantly across
text, image, audio, and video domains, 3D generation remains comparatively
underdeveloped due to fundamental challenges such as data scarcity, algorithmic
limitations, and ecosystem fragmentation. To this end, we present Step1X-3D, an
open framework addressing these challenges through: (1) a rigorous data
curation pipeline processing >5M assets to create a 2M high-quality dataset
with standardized geometric and textural properties; (2) a two-stage 3D-native
architecture combining a hybrid VAE-DiT geometry generator with an
diffusion-based texture synthesis module; and (3) the full open-source release
of models, training code, and adaptation modules. For geometry generation, the
hybrid VAE-DiT component produces TSDF representations by employing
perceiver-based latent encoding with sharp edge sampling for detail
preservation. The diffusion-based texture synthesis module then ensures
cross-view consistency through geometric conditioning and latent-space
synchronization. Benchmark results demonstrate state-of-the-art performance
that exceeds existing open-source methods, while also achieving competitive
quality with proprietary solutions. Notably, the framework uniquely bridges the
2D and 3D generation paradigms by supporting direct transfer of 2D control
techniques~(e.g., LoRA) to 3D synthesis. By simultaneously advancing data
quality, algorithmic fidelity, and reproducibility, Step1X-3D aims to establish
new standards for open research in controllable 3D asset generation.Summary
AI-Generated Summary