Weinige-stappen Flow voor 3D-generatie via Marginal-Data Transport Distillatie

Samenvatting

Flow-based 3D-generatiemodellen vereisen doorgaans tientallen samplingstappen tijdens inferentie. Hoewel few-step distillatiemethoden, met name Consistency Models (CMs), aanzienlijke vooruitgang hebben geboekt bij het versnellen van 2D-diffusiemodellen, blijven ze onderbelicht voor complexere 3D-generatietaken. In deze studie stellen we een nieuw framework voor, MDT-dist, voor few-step 3D-flowdistillatie. Onze aanpak is gebaseerd op een primair doel: het distilleren van het voorgetrainde model om het Marginal-Data Transport te leren. Het direct leren van dit doel vereist het integreren van de snelheidsvelden, maar deze integraal is moeilijk te implementeren. Daarom stellen we twee optimaliseerbare doelen voor, Velocity Matching (VM) en Velocity Distillation (VD), om het optimalisatiedoel respectievelijk om te zetten van het transportniveau naar het snelheids- en distributieniveau. Velocity Matching (VM) leert om de snelheidsvelden tussen de student en de leraar stabiel te matchen, maar biedt onvermijdelijk bevooroordeelde gradientenschattingen. Velocity Distillation (VD) verbetert het optimalisatieproces verder door gebruik te maken van de geleerde snelheidsvelden om waarschijnlijkheidsdichtheidsdistillatie uit te voeren. Bij evaluatie op het baanbrekende 3D-generatieframework TRELLIS, reduceert onze methode de samplingstappen van elke flowtransformer van 25 naar 1 of 2, wat resulteert in een latentie van 0,68s (1 stap x 2) en 0,94s (2 stappen x 2) met een versnelling van 9,0x en 6,5x op A800, terwijl een hoge visuele en geometrische nauwkeurigheid behouden blijft. Uitgebreide experimenten tonen aan dat onze methode bestaande CM-distillatiemethoden significant overtreft, en TRELLIS in staat stelt superieure prestaties te behalen in few-step 3D-generatie.

English

Flow-based 3D generation models typically require dozens of sampling steps during inference. Though few-step distillation methods, particularly Consistency Models (CMs), have achieved substantial advancements in accelerating 2D diffusion models, they remain under-explored for more complex 3D generation tasks. In this study, we propose a novel framework, MDT-dist, for few-step 3D flow distillation. Our approach is built upon a primary objective: distilling the pretrained model to learn the Marginal-Data Transport. Directly learning this objective needs to integrate the velocity fields, while this integral is intractable to be implemented. Therefore, we propose two optimizable objectives, Velocity Matching (VM) and Velocity Distillation (VD), to equivalently convert the optimization target from the transport level to the velocity and the distribution level respectively. Velocity Matching (VM) learns to stably match the velocity fields between the student and the teacher, but inevitably provides biased gradient estimates. Velocity Distillation (VD) further enhances the optimization process by leveraging the learned velocity fields to perform probability density distillation. When evaluated on the pioneer 3D generation framework TRELLIS, our method reduces sampling steps of each flow transformer from 25 to 1 or 2, achieving 0.68s (1 step x 2) and 0.94s (2 steps x 2) latency with 9.0x and 6.5x speedup on A800, while preserving high visual and geometric fidelity. Extensive experiments demonstrate that our method significantly outperforms existing CM distillation methods, and enables TRELLIS to achieve superior performance in few-step 3D generation.

Weinige-stappen Flow voor 3D-generatie via Marginal-Data Transport Distillatie

Few-step Flow for 3D Generation via Marginal-Data Transport Distillation

Samenvatting

Support