TAPESTRY: 일관된 턴테이블 영상을 통해 형상에서 외관까지

초록

텍스처가 없는 3D 모델에 대해 사실적이고 자기 일관된 외관을 자동으로 생성하는 것은 디지털 콘텐츠 제작에서 중요한 과제입니다. 대규모 비디오 생성 모델의 발전은 자연스러운 접근법을 제공합니다: 360도 턴테이블 비디오(TTV)를 직접 합성하는 것으로, 이는 고품질의 동적 미리보기 역할을 할 뿐만 아니라 텍스처 합성 및 신경망 렌더링을 구동하는 중간 표현으로도 사용될 수 있습니다. 그러나 기존의 범용 비디오 확산 모델은 모든 뷰에 걸쳐 엄격한 기하학적 일관성과 외관 안정성을 유지하는 데 어려움을 겪어, 그 출력이 고품질 3D 재구성에는 부적합합니다. 이를 위해 우리는 명시적 3D 형상을 조건으로 고품질 TTV를 생성하는 프레임워크인 TAPESTRY를 소개합니다. 우리는 3D 외관 생성 작업을 형상 조건화 비디오 확산 문제로 재구성합니다: 3D 메시가 주어지면, 먼저 다중 모드 기하학적 특징을 렌더링 및 인코딩하여 픽셀 수준의 정밀도로 비디오 생성 과정을 제한함으로써 고품질이고 일관된 TTV의 생성을 가능하게 합니다. 이를 기반으로 TTV 입력으로부터의 다운스트림 재구성 작업을 위한 방법도 설계하였으며, 여기에는 3D 인페인팅을 포함한 다단계 파이프라인이 특징입니다. 모델을 회전시키고 컨텍스트 인식 이차 생성을 수행함으로써, 이 파이프라인은 자체 가려진 영역을 효과적으로 완성하여 전체 표면 커버리지를 달성합니다. TAPESTRY로 생성된 비디오는 고품질의 동적 미리보기일 뿐만 아니라, UV 텍스처로 원활하게 역투영되거나 3DGS와 같은 신경망 렌더링 방법을 지도하는 데 사용될 수 있는 신뢰할 수 있는 3D 인식 중간 표현 역할을 합니다. 이는 텍스처 없는 메시로부터 프로덕션 준비가 완료된 전체 3D 에셋의 자동화된 생성을 가능하게 합니다. 실험 결과는 우리의 방법이 비디오 일관성과 최종 재구성 품질 모두에서 기존 접근법을 능가함을 보여줍니다.

English

Automatically generating photorealistic and self-consistent appearances for untextured 3D models is a critical challenge in digital content creation. The advancement of large-scale video generation models offers a natural approach: directly synthesizing 360-degree turntable videos (TTVs), which can serve not only as high-quality dynamic previews but also as an intermediate representation to drive texture synthesis and neural rendering. However, existing general-purpose video diffusion models struggle to maintain strict geometric consistency and appearance stability across the full range of views, making their outputs ill-suited for high-quality 3D reconstruction. To this end, we introduce TAPESTRY, a framework for generating high-fidelity TTVs conditioned on explicit 3D geometry. We reframe the 3D appearance generation task as a geometry-conditioned video diffusion problem: given a 3D mesh, we first render and encode multi-modal geometric features to constrain the video generation process with pixel-level precision, thereby enabling the creation of high-quality and consistent TTVs. Building upon this, we also design a method for downstream reconstruction tasks from the TTV input, featuring a multi-stage pipeline with 3D-Aware Inpainting. By rotating the model and performing a context-aware secondary generation, this pipeline effectively completes self-occluded regions to achieve full surface coverage. The videos generated by TAPESTRY are not only high-quality dynamic previews but also serve as a reliable, 3D-aware intermediate representation that can be seamlessly back-projected into UV textures or used to supervise neural rendering methods like 3DGS. This enables the automated creation of production-ready, complete 3D assets from untextured meshes. Experimental results demonstrate that our method outperforms existing approaches in both video consistency and final reconstruction quality.

TAPESTRY: 일관된 턴테이블 영상을 통해 형상에서 외관까지

TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

초록

Support