TexFusion: テキスト誘導型画像拡散モデルによる3Dテクスチャの合成

要旨

本論文では、大規模なテキストガイド付き画像拡散モデルを用いて、与えられた3Dジオメトリのテクスチャを合成する新しい手法であるTexFusion（Texture Diffusion）を提案します。最近の研究では、2Dのテキストから画像への拡散モデルを活用して、遅くて脆弱な最適化プロセスを通じて3Dオブジェクトを蒸留する手法が用いられていますが、TexFusionは、異なる2Dレンダリングビューに対して通常の拡散モデルサンプリングを適用する、テクスチャ合成に特化した新しい3D整合性生成技術を導入します。具体的には、潜在拡散モデルを活用し、3Dオブジェクトの一連の2Dレンダリングに対して拡散モデルのデノイザーを適用し、異なるデノイジング予測を共有の潜在テクスチャマップに集約します。最終的なRGBテクスチャは、潜在テクスチャの2Dレンダリングのデコードに対して中間ニューラルカラーフィールドを最適化することで生成されます。TexFusionを徹底的に検証し、多様で高品質かつグローバルに一貫したテクスチャを効率的に生成できることを示します。画像拡散モデルのみを使用して、テキストガイド付きテクスチャ合成において最先端の性能を達成し、従来の蒸留ベースの手法の欠点を回避します。テキスト条件付けにより詳細な制御が可能であり、トレーニングに実際の3Dテクスチャデータを必要としません。これにより、本手法は多様なジオメトリとテクスチャタイプに適用可能な汎用性を備えています。TexFusionが、仮想現実、ゲームデザイン、シミュレーションなどのアプリケーションにおける3DアセットのAIベースのテクスチャリングを進展させることを期待しています。

English

We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture synthesis that employs regular diffusion model sampling on different 2D rendered views. Specifically, we leverage latent diffusion models, apply the diffusion model's denoiser on a set of 2D renders of the 3D object, and aggregate the different denoising predictions on a shared latent texture map. Final output RGB textures are produced by optimizing an intermediate neural color field on the decodings of 2D renders of the latent texture. We thoroughly validate TexFusion and show that we can efficiently generate diverse, high quality and globally coherent textures. We achieve state-of-the-art text-guided texture synthesis performance using only image diffusion models, while avoiding the pitfalls of previous distillation-based methods. The text-conditioning offers detailed control and we also do not rely on any ground truth 3D textures for training. This makes our method versatile and applicable to a broad range of geometry and texture types. We hope that TexFusion will advance AI-based texturing of 3D assets for applications in virtual reality, game design, simulation, and more.

TexFusion: テキスト誘導型画像拡散モデルによる3Dテクスチャの合成

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

要旨

Support