NaTex: 잠재 색상 확산으로 구현하는 원활한 텍스처 생성

초록

본 논문에서는 3D 공간에서 직접 텍스처 색상을 예측하는 네이티브 텍스처 생성 프레임워크인 NaTex를 소개한다. 기하구조 조건부 다중 뷰 확산 모델(MVD)로 합성된 2D 다중 뷰 이미지 베이킹에 의존하는 기존 접근법과 달리, NaTex는 MVD 파이프라인의 몇 가지 본질적 한계를 회피한다. 이러한 한계에는 인페인팅이 필요한 폐색 영역 처리의 어려움, 경계를 따라 정확한 메쉬-텍스처 정렬 달성, 그리고 콘텐츠 및 색상 강도 측면에서의 뷰 간 일관성과 응집성 유지가 포함된다. NaTex는 텍스처를 조밀한 색상 포인트 클라우드로 간주함으로써 앞서 언급한 문제들을 해결하는 새로운 패러다임을 특징으로 한다. 이 아이디어에 기반하여, 우리는 텍스처 재구성 및 생성을 위해 3D 데이터를 사용하여 처음부터 완전히 훈련된 지오메트리 인식 색상 포인트 클라우드 VAE와 다중 제어 확산 트랜스포머(DiT)로 구성된 잠재 색상 확산(latent color diffusion)을 제안한다. 정확한 정렬을 가능하게 하기 위해, 우리는 위치 임베딩과 지오메트리 잠재 코드를 통해 DiT에 직접 3D 공간 정보를 조건으로 제공하는 네이티브 지오메트리 제어(native geometry control)를 도입한다. 우리는 VAE-DiT 아키텍처를 공동 설계하였으며, 여기서 지오메트리 잠재 코드는 색상 VAE와 긴밀하게 결합된 전용 지오메트리 브랜치를 통해 추출되어 텍스처와의 강한 대응 관계를 유지하는 세밀한 표면 가이던스를 제공한다. 이러한 설계를 통해 NaTex는 강력한 성능을 보여주며, 텍스처 응집성과 정렬 측면에서 기존 방법들을 크게 능가한다. 더욱이 NaTex는 훈련 없이 또는 간단한 튜닝만으로 다양한 하류 작업(예: 재질 생성, 텍스처 정제, 부품 분할 및 텍스처링)에 대해 강력한 일반화 능력도 보여준다.

English

We present NaTex, a native texture generation framework that predicts texture color directly in 3D space. In contrast to previous approaches that rely on baking 2D multi-view images synthesized by geometry-conditioned Multi-View Diffusion models (MVDs), NaTex avoids several inherent limitations of the MVD pipeline. These include difficulties in handling occluded regions that require inpainting, achieving precise mesh-texture alignment along boundaries, and maintaining cross-view consistency and coherence in both content and color intensity. NaTex features a novel paradigm that addresses the aforementioned issues by viewing texture as a dense color point cloud. Driven by this idea, we propose latent color diffusion, which comprises a geometry-awared color point cloud VAE and a multi-control diffusion transformer (DiT), entirely trained from scratch using 3D data, for texture reconstruction and generation. To enable precise alignment, we introduce native geometry control that conditions the DiT on direct 3D spatial information via positional embeddings and geometry latents. We co-design the VAE-DiT architecture, where the geometry latents are extracted via a dedicated geometry branch tightly coupled with the color VAE, providing fine-grained surface guidance that maintains strong correspondence with the texture. With these designs, NaTex demonstrates strong performance, significantly outperforming previous methods in texture coherence and alignment. Moreover, NaTex also exhibits strong generalization capabilities, either training-free or with simple tuning, for various downstream applications, e.g., material generation, texture refinement, and part segmentation and texturing.

NaTex: 잠재 색상 확산으로 구현하는 원활한 텍스처 생성

NaTex: Seamless Texture Generation as Latent Color Diffusion

초록

Support