釋放向量集擴散模型以實現快速形狀生成
Unleashing Vecset Diffusion Model for Fast Shape Generation
March 20, 2025
作者: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qinxiang Lin, Jinwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue
cs.AI
摘要
三維形狀生成技術隨著所謂「原生」三維擴散模型的發展而蓬勃興起,尤其是通過Vecset擴散模型(VDM)的應用。儘管近期的進展在生成高分辨率三維形狀方面展現了令人鼓舞的成果,VDM在高速生成方面仍面臨挑戰。這些挑戰不僅源於加速擴散採樣的困難,還包括VDM中VAE解碼的難題,這些領域在先前的研究中尚未得到充分探索。為應對這些挑戰,我們提出了FlashVDM,這是一個系統性框架,旨在加速VDM中的VAE和DiT。對於DiT,FlashVDM實現了僅需5步推理即可完成靈活的擴散採樣,並保持相當的質量,這得益於我們新引入的漸進流蒸餾技術對一致性蒸餾的穩定化處理。對於VAE,我們配備了具備自適應KV選擇、分層體積解碼及高效網絡設計的閃電式vecset解碼器。通過利用vecset的局部性和體積中形狀表面的稀疏性,我們的解碼器大幅降低了浮點運算次數(FLOPs),從而最小化解碼的總體開銷。我們將FlashVDM應用於Hunyuan3D-2,得到了Hunyuan3D-2 Turbo。通過系統性評估,我們展示了該模型在快速三維生成方法中顯著超越現有技術,在保持與頂尖技術相當性能的同時,將重建和生成的推理時間分別縮短了超過45倍和32倍。代碼和模型可在https://github.com/Tencent/FlashVDM獲取。
English
3D shape generation has greatly flourished through the development of
so-called "native" 3D diffusion, particularly through the Vecset Diffusion
Model (VDM). While recent advancements have shown promising results in
generating high-resolution 3D shapes, VDM still struggles with high-speed
generation. Challenges exist because of difficulties not only in accelerating
diffusion sampling but also VAE decoding in VDM, areas under-explored in
previous works. To address these challenges, we present FlashVDM, a systematic
framework for accelerating both VAE and DiT in VDM. For DiT, FlashVDM enables
flexible diffusion sampling with as few as 5 inference steps and comparable
quality, which is made possible by stabilizing consistency distillation with
our newly introduced Progressive Flow Distillation. For VAE, we introduce a
lightning vecset decoder equipped with Adaptive KV Selection, Hierarchical
Volume Decoding, and Efficient Network Design. By exploiting the locality of
the vecset and the sparsity of shape surface in the volume, our decoder
drastically lowers FLOPs, minimizing the overall decoding overhead. We apply
FlashVDM to Hunyuan3D-2 to obtain Hunyuan3D-2 Turbo. Through systematic
evaluation, we show that our model significantly outperforms existing fast 3D
generation methods, achieving comparable performance to the state-of-the-art
while reducing inference time by over 45x for reconstruction and 32x for
generation. Code and models are available at
https://github.com/Tencent/FlashVDM.Summary
AI-Generated Summary