《TAPESTRY:基于一致性转台视频从几何建模到外观渲染的完整流程》
TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos
March 18, 2026
作者: Yan Zeng, Haoran Jiang, Kaixin Yao, Qixuan Zhang, Longwen Zhang, Lan Xu, Jingyi Yu
cs.AI
摘要
为未贴图三维模型自动生成具有照片级真实感且自洽的外观是数字内容创作中的关键挑战。大规模视频生成模型的进展提供了一种自然解决方案:直接合成360度旋转展示视频。这种视频不仅能作为高质量动态预览,还可作为驱动纹理合成与神经渲染的中间表征。然而,现有通用视频扩散模型难以在全方位视角下保持严格的几何一致性与外观稳定性,导致其输出不适用于高质量三维重建。为此,我们提出TAPESTRY框架,通过显式三维几何条件生成高保真旋转视频。我们将三维外观生成任务重新定义为几何条件视频扩散问题:给定三维网格,首先渲染并编码多模态几何特征,以像素级精度约束视频生成过程,从而实现高质量且一致的旋转视频生成。在此基础上,我们还设计了针对旋转视频输入的下游重建方法,采用包含三维感知修复的多阶段流程。通过旋转模型并执行上下文感知的二次生成,该流程能有效补全自遮挡区域以实现全表面覆盖。TAPESTRY生成的视频不仅是高质量动态预览,更可作为可靠的三维感知中间表征,能无缝反投影至UV纹理或用于指导3DGS等神经渲染方法,从而实现从未贴图网格到生产级完整三维资产的自动化创建。实验结果表明,本方法在视频一致性与最终重建质量方面均优于现有方法。
English
Automatically generating photorealistic and self-consistent appearances for untextured 3D models is a critical challenge in digital content creation. The advancement of large-scale video generation models offers a natural approach: directly synthesizing 360-degree turntable videos (TTVs), which can serve not only as high-quality dynamic previews but also as an intermediate representation to drive texture synthesis and neural rendering. However, existing general-purpose video diffusion models struggle to maintain strict geometric consistency and appearance stability across the full range of views, making their outputs ill-suited for high-quality 3D reconstruction. To this end, we introduce TAPESTRY, a framework for generating high-fidelity TTVs conditioned on explicit 3D geometry. We reframe the 3D appearance generation task as a geometry-conditioned video diffusion problem: given a 3D mesh, we first render and encode multi-modal geometric features to constrain the video generation process with pixel-level precision, thereby enabling the creation of high-quality and consistent TTVs. Building upon this, we also design a method for downstream reconstruction tasks from the TTV input, featuring a multi-stage pipeline with 3D-Aware Inpainting. By rotating the model and performing a context-aware secondary generation, this pipeline effectively completes self-occluded regions to achieve full surface coverage. The videos generated by TAPESTRY are not only high-quality dynamic previews but also serve as a reliable, 3D-aware intermediate representation that can be seamlessly back-projected into UV textures or used to supervise neural rendering methods like 3DGS. This enables the automated creation of production-ready, complete 3D assets from untextured meshes. Experimental results demonstrate that our method outperforms existing approaches in both video consistency and final reconstruction quality.