ChatPaper.aiChatPaper

基於邊緣數據傳輸蒸餾的少步數三維生成流程

Few-step Flow for 3D Generation via Marginal-Data Transport Distillation

September 4, 2025
作者: Zanwei Zhou, Taoran Yi, Jiemin Fang, Chen Yang, Lingxi Xie, Xinggang Wang, Wei Shen, Qi Tian
cs.AI

摘要

基於流的三維生成模型在推理過程中通常需要數十次採樣步驟。儘管少步蒸餾方法,尤其是一致性模型(CMs),在加速二維擴散模型方面取得了顯著進展,但這些方法在更複雜的三維生成任務中仍未被充分探索。在本研究中,我們提出了一種新穎的框架——MDT-dist,用於少步三維流蒸餾。我們的方法基於一個主要目標:蒸餾預訓練模型以學習邊際數據傳輸。直接學習這一目標需要整合速度場,而這一積分難以實現。因此,我們提出了兩個可優化的目標——速度匹配(VM)和速度蒸餾(VD),分別將優化目標從傳輸層面等價轉換到速度和分佈層面。速度匹配(VM)學習穩定地匹配學生模型和教師模型之間的速度場,但不可避免地提供有偏的梯度估計。速度蒸餾(VD)進一步通過利用已學習的速度場進行概率密度蒸餾來增強優化過程。在評估先驅三維生成框架TRELLIS時,我們的方法將每個流變換器的採樣步驟從25次減少到1或2次,在A800上實現了0.68秒(1步x2)和0.94秒(2步x2)的延遲,分別獲得了9.0倍和6.5倍的加速,同時保持了高視覺和幾何保真度。大量實驗表明,我們的方法顯著優於現有的CM蒸餾方法,並使TRELLIS在少步三維生成中實現了卓越的性能。
English
Flow-based 3D generation models typically require dozens of sampling steps during inference. Though few-step distillation methods, particularly Consistency Models (CMs), have achieved substantial advancements in accelerating 2D diffusion models, they remain under-explored for more complex 3D generation tasks. In this study, we propose a novel framework, MDT-dist, for few-step 3D flow distillation. Our approach is built upon a primary objective: distilling the pretrained model to learn the Marginal-Data Transport. Directly learning this objective needs to integrate the velocity fields, while this integral is intractable to be implemented. Therefore, we propose two optimizable objectives, Velocity Matching (VM) and Velocity Distillation (VD), to equivalently convert the optimization target from the transport level to the velocity and the distribution level respectively. Velocity Matching (VM) learns to stably match the velocity fields between the student and the teacher, but inevitably provides biased gradient estimates. Velocity Distillation (VD) further enhances the optimization process by leveraging the learned velocity fields to perform probability density distillation. When evaluated on the pioneer 3D generation framework TRELLIS, our method reduces sampling steps of each flow transformer from 25 to 1 or 2, achieving 0.68s (1 step x 2) and 0.94s (2 steps x 2) latency with 9.0x and 6.5x speedup on A800, while preserving high visual and geometric fidelity. Extensive experiments demonstrate that our method significantly outperforms existing CM distillation methods, and enables TRELLIS to achieve superior performance in few-step 3D generation.
PDF81September 5, 2025