TexFusion: 텍스트 기반 이미지 확산 모델을 활용한 3D 텍스처 합성

초록

본 논문에서는 대규모 텍스트 기반 이미지 확산 모델을 활용하여 주어진 3D 기하 구조에 대한 텍스처를 합성하는 새로운 방법인 TexFusion(Texture Diffusion)을 소개합니다. 최근 연구들이 느리고 불안정한 최적화 과정을 통해 2D 텍스트-이미지 확산 모델을 활용하여 3D 객체를 추출하는 것과 달리, TexFusion은 텍스처 합성을 위해 특별히 설계된 새로운 3D 일관성 생성 기법을 도입합니다. 이 기법은 서로 다른 2D 렌더링 뷰에서 정규 확산 모델 샘플링을 사용합니다. 구체적으로, 잠재 확산 모델을 활용하고 3D 객체의 2D 렌더링 세트에 확산 모델의 노이즈 제거기를 적용한 후, 공유된 잠재 텍스처 맵에 다양한 노이즈 제거 예측을 통합합니다. 최종 출력 RGB 텍스처는 잠재 텍스처의 2D 렌더링 디코딩에 대한 중간 신경망 색상 필드를 최적화하여 생성됩니다. TexFusion을 철저히 검증하여 다양하고 고품질이며 전역적으로 일관된 텍스처를 효율적으로 생성할 수 있음을 보여줍니다. 우리는 이미지 확산 모델만을 사용하여 최첨단 텍스트 기반 텍스처 합성 성능을 달성함과 동시에 이전의 추출 기반 방법의 문제점을 피합니다. 텍스트 조건화는 세밀한 제어를 제공하며, 학습을 위해 실제 3D 텍스처 데이터에 의존하지 않습니다. 이로 인해 우리의 방법은 다양한 기하 구조와 텍스처 유형에 적용 가능하며 다용도로 사용될 수 있습니다. TexFusion이 가상 현실, 게임 디자인, 시뮬레이션 등에서 3D 자산의 AI 기반 텍스처링을 발전시키길 기대합니다.

English

We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture synthesis that employs regular diffusion model sampling on different 2D rendered views. Specifically, we leverage latent diffusion models, apply the diffusion model's denoiser on a set of 2D renders of the 3D object, and aggregate the different denoising predictions on a shared latent texture map. Final output RGB textures are produced by optimizing an intermediate neural color field on the decodings of 2D renders of the latent texture. We thoroughly validate TexFusion and show that we can efficiently generate diverse, high quality and globally coherent textures. We achieve state-of-the-art text-guided texture synthesis performance using only image diffusion models, while avoiding the pitfalls of previous distillation-based methods. The text-conditioning offers detailed control and we also do not rely on any ground truth 3D textures for training. This makes our method versatile and applicable to a broad range of geometry and texture types. We hope that TexFusion will advance AI-based texturing of 3D assets for applications in virtual reality, game design, simulation, and more.

TexFusion: 텍스트 기반 이미지 확산 모델을 활용한 3D 텍스처 합성

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

초록

Support