少数ステップによる3D生成のためのマージナルデータ輸送蒸留

要旨

Flow-based 3D生成モデルは、通常、推論時に数十のサンプリングステップを必要とします。特にConsistency Models (CMs)のような少ステップ蒸留法は、2D拡散モデルの高速化において大きな進展を遂げていますが、より複雑な3D生成タスクではまだ十分に探索されていません。本研究では、少ステップ3Dフロー蒸留のための新しいフレームワーク、MDT-distを提案します。我々のアプローチは、事前学習済みモデルを蒸留してMarginal-Data Transportを学習するという主要な目的に基づいています。この目的を直接学習するためには、速度場を統合する必要がありますが、この積分は実装が困難です。そこで、我々は2つの最適化可能な目的、Velocity Matching (VM)とVelocity Distillation (VD)を提案し、最適化ターゲットを輸送レベルから速度レベルおよび分布レベルにそれぞれ等価に変換します。Velocity Matching (VM)は、学生モデルと教師モデルの間の速度場を安定して一致させることを学習しますが、必然的にバイアスがかかった勾配推定を提供します。Velocity Distillation (VD)は、学習された速度場を活用して確率密度蒸留を実行することで、最適化プロセスをさらに強化します。先駆的な3D生成フレームワークであるTRELLISで評価した結果、我々の方法は各フロートランスフォーマーのサンプリングステップを25から1または2に削減し、A800上で0.68秒（1ステップ x 2）および0.94秒（2ステップ x 2）のレイテンシを達成し、9.0倍および6.5倍の高速化を実現しながら、高い視覚的および幾何学的忠実度を維持しました。広範な実験により、我々の方法が既存のCM蒸留法を大幅に上回り、TRELLISが少ステップ3D生成において優れた性能を発揮することを実証しました。

English

Flow-based 3D generation models typically require dozens of sampling steps during inference. Though few-step distillation methods, particularly Consistency Models (CMs), have achieved substantial advancements in accelerating 2D diffusion models, they remain under-explored for more complex 3D generation tasks. In this study, we propose a novel framework, MDT-dist, for few-step 3D flow distillation. Our approach is built upon a primary objective: distilling the pretrained model to learn the Marginal-Data Transport. Directly learning this objective needs to integrate the velocity fields, while this integral is intractable to be implemented. Therefore, we propose two optimizable objectives, Velocity Matching (VM) and Velocity Distillation (VD), to equivalently convert the optimization target from the transport level to the velocity and the distribution level respectively. Velocity Matching (VM) learns to stably match the velocity fields between the student and the teacher, but inevitably provides biased gradient estimates. Velocity Distillation (VD) further enhances the optimization process by leveraging the learned velocity fields to perform probability density distillation. When evaluated on the pioneer 3D generation framework TRELLIS, our method reduces sampling steps of each flow transformer from 25 to 1 or 2, achieving 0.68s (1 step x 2) and 0.94s (2 steps x 2) latency with 9.0x and 6.5x speedup on A800, while preserving high visual and geometric fidelity. Extensive experiments demonstrate that our method significantly outperforms existing CM distillation methods, and enables TRELLIS to achieve superior performance in few-step 3D generation.

少数ステップによる3D生成のためのマージナルデータ輸送蒸留

Few-step Flow for 3D Generation via Marginal-Data Transport Distillation

要旨

Support