ChatPaper.aiChatPaper

释放向量集扩散模型,实现快速形状生成

Unleashing Vecset Diffusion Model for Fast Shape Generation

March 20, 2025
作者: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qinxiang Lin, Jinwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue
cs.AI

摘要

三维形状生成技术因所谓“原生”三维扩散模型的发展而大放异彩,尤其是向量集扩散模型(VDM)的提出。尽管近期进展在生成高分辨率三维形状方面展现了令人鼓舞的成果,VDM在高速生成上仍面临挑战。这些挑战不仅源于加速扩散采样的困难,还包括VDM中变分自编码器(VAE)解码的瓶颈,这些领域在以往工作中尚未得到充分探索。为应对这些难题,我们推出了FlashVDM,一个旨在加速VDM中VAE与扩散变换器(DiT)的系统框架。针对DiT,FlashVDM实现了仅需5步推理即可完成灵活扩散采样,且质量相当,这得益于我们新引入的渐进流蒸馏技术对一致性蒸馏的稳定作用。对于VAE,我们设计了一款配备自适应键值选择、层次化体积解码及高效网络架构的闪电向量集解码器。通过利用向量集的局部性及体积中形状表面的稀疏性,我们的解码器大幅降低了浮点运算次数,显著减少了整体解码开销。我们将FlashVDM应用于Hunyuan3D-2,从而获得了Hunyuan3D-2 Turbo。经过系统评估,我们的模型在快速三维生成方法中表现卓越,不仅与当前最先进技术性能相当,更在重建和生成任务上分别将推理时间缩短了超过45倍和32倍。代码与模型已发布于https://github.com/Tencent/FlashVDM。
English
3D shape generation has greatly flourished through the development of so-called "native" 3D diffusion, particularly through the Vecset Diffusion Model (VDM). While recent advancements have shown promising results in generating high-resolution 3D shapes, VDM still struggles with high-speed generation. Challenges exist because of difficulties not only in accelerating diffusion sampling but also VAE decoding in VDM, areas under-explored in previous works. To address these challenges, we present FlashVDM, a systematic framework for accelerating both VAE and DiT in VDM. For DiT, FlashVDM enables flexible diffusion sampling with as few as 5 inference steps and comparable quality, which is made possible by stabilizing consistency distillation with our newly introduced Progressive Flow Distillation. For VAE, we introduce a lightning vecset decoder equipped with Adaptive KV Selection, Hierarchical Volume Decoding, and Efficient Network Design. By exploiting the locality of the vecset and the sparsity of shape surface in the volume, our decoder drastically lowers FLOPs, minimizing the overall decoding overhead. We apply FlashVDM to Hunyuan3D-2 to obtain Hunyuan3D-2 Turbo. Through systematic evaluation, we show that our model significantly outperforms existing fast 3D generation methods, achieving comparable performance to the state-of-the-art while reducing inference time by over 45x for reconstruction and 32x for generation. Code and models are available at https://github.com/Tencent/FlashVDM.

Summary

AI-Generated Summary

PDF444March 21, 2025