Paint3D: 照明なしテクスチャ拡散モデルによる3Dオブジェクトのペイント

要旨

本論文では、Paint3Dという新しい粗密生成フレームワークを提案する。このフレームワークは、テキストや画像入力を条件として、未テクスチャの3Dメッシュに対して高解像度で照明情報を含まない多様な2K UVテクスチャマップを生成することができる。本手法が取り組む主要な課題は、埋め込まれた照明情報を含まない高品質なテクスチャを生成することであり、これにより、現代のグラフィックスパイプライン内でテクスチャを再照明または再編集することが可能となる。これを実現するため、本手法ではまず、事前学習済みの深度認識2D拡散モデルを活用して視点条件付き画像を生成し、マルチビューテクスチャ融合を行い、初期の粗いテクスチャマップを作成する。しかし、2Dモデルは3D形状を完全に表現できず、照明効果を無効化できないため、粗いテクスチャマップには不完全な領域や照明アーティファクトが現れる。これを解決するために、不完全な領域の形状認識リファインメントと照明アーティファクトの除去に特化したUV InpaintingおよびUVHD拡散モデルを別々に学習する。この粗密プロセスを通じて、Paint3Dはセマンティック一貫性を維持しつつ照明情報を含まない高品質な2K UVテクスチャを生成することができ、3Dオブジェクトのテクスチャリングにおける最先端技術を大幅に進展させる。

English

This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within modern graphics pipelines. To achieve this, our method first leverages a pre-trained depth-aware 2D diffusion model to generate view-conditional images and perform multi-view texture fusion, producing an initial coarse texture map. However, as 2D models cannot fully represent 3D shapes and disable lighting effects, the coarse texture map exhibits incomplete areas and illumination artifacts. To resolve this, we train separate UV Inpainting and UVHD diffusion models specialized for the shape-aware refinement of incomplete areas and the removal of illumination artifacts. Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain semantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects.

Paint3D: 照明なしテクスチャ拡散モデルによる3Dオブジェクトのペイント

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

要旨

Support