释放向量集扩散模型,实现快速形状生成
Unleashing Vecset Diffusion Model for Fast Shape Generation
March 20, 2025
作者: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qinxiang Lin, Jinwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue
cs.AI
摘要
三维形状生成技术因所谓“原生”三维扩散模型的发展而大放异彩,尤其是向量集扩散模型(VDM)的提出。尽管近期进展在生成高分辨率三维形状方面展现了令人鼓舞的成果,VDM在高速生成上仍面临挑战。这些挑战不仅源于加速扩散采样的困难,还包括VDM中变分自编码器(VAE)解码的瓶颈,这些领域在以往工作中尚未得到充分探索。为应对这些难题,我们推出了FlashVDM,一个旨在加速VDM中VAE与扩散变换器(DiT)的系统框架。针对DiT,FlashVDM实现了仅需5步推理即可完成灵活扩散采样,且质量相当,这得益于我们新引入的渐进流蒸馏技术对一致性蒸馏的稳定作用。对于VAE,我们设计了一款配备自适应键值选择、层次化体积解码及高效网络架构的闪电向量集解码器。通过利用向量集的局部性及体积中形状表面的稀疏性,我们的解码器大幅降低了浮点运算次数,显著减少了整体解码开销。我们将FlashVDM应用于Hunyuan3D-2,从而获得了Hunyuan3D-2 Turbo。经过系统评估,我们的模型在快速三维生成方法中表现卓越,不仅与当前最先进技术性能相当,更在重建和生成任务上分别将推理时间缩短了超过45倍和32倍。代码与模型已发布于https://github.com/Tencent/FlashVDM。
English
3D shape generation has greatly flourished through the development of
so-called "native" 3D diffusion, particularly through the Vecset Diffusion
Model (VDM). While recent advancements have shown promising results in
generating high-resolution 3D shapes, VDM still struggles with high-speed
generation. Challenges exist because of difficulties not only in accelerating
diffusion sampling but also VAE decoding in VDM, areas under-explored in
previous works. To address these challenges, we present FlashVDM, a systematic
framework for accelerating both VAE and DiT in VDM. For DiT, FlashVDM enables
flexible diffusion sampling with as few as 5 inference steps and comparable
quality, which is made possible by stabilizing consistency distillation with
our newly introduced Progressive Flow Distillation. For VAE, we introduce a
lightning vecset decoder equipped with Adaptive KV Selection, Hierarchical
Volume Decoding, and Efficient Network Design. By exploiting the locality of
the vecset and the sparsity of shape surface in the volume, our decoder
drastically lowers FLOPs, minimizing the overall decoding overhead. We apply
FlashVDM to Hunyuan3D-2 to obtain Hunyuan3D-2 Turbo. Through systematic
evaluation, we show that our model significantly outperforms existing fast 3D
generation methods, achieving comparable performance to the state-of-the-art
while reducing inference time by over 45x for reconstruction and 32x for
generation. Code and models are available at
https://github.com/Tencent/FlashVDM.Summary
AI-Generated Summary