標準生成器：從性解析的單張圖像生成三維角色

摘要

我們提出了StdGEN，一個創新的流程，從單張圖像中生成語義分解的高質量3D角色，可廣泛應用於虛擬現實、遊戲和電影製作等領域。與以往的方法不同，這些方法在分解能力有限、質量不佳和優化時間長方面存在問題，StdGEN具有分解能力、效果和效率；即在三分鐘內生成具有細緻細節的3D角色，並分離語義組件，如身體、衣服和頭髮。StdGEN的核心是我們提出的語義感知大型重建模型（S-LRM），這是一個基於Transformer的通用模型，可以以前向傳播的方式從多視圖圖像中聯合重建幾何、顏色和語義。引入了可微分的多層語義表面提取方案，從我們的S-LRM重建的混合隱式場中獲取網格。此外，還將專門的高效多視圖擴散模型和迭代多層表面細化模塊集成到流程中，以促進高質量、可分解的3D角色生成。大量實驗證明了我們在3D動漫角色生成方面的最新性能，在幾何、紋理和分解能力方面明顯優於現有基準。StdGEN提供即用的語義分解3D角色，並為各種應用提供靈活的定制功能。項目頁面：https://stdgen.github.io

English

We present StdGEN, an innovative pipeline for generating semantically decomposed high-quality 3D characters from single images, enabling broad applications in virtual reality, gaming, and filmmaking, etc. Unlike previous methods which struggle with limited decomposability, unsatisfactory quality, and long optimization times, StdGEN features decomposability, effectiveness and efficiency; i.e., it generates intricately detailed 3D characters with separated semantic components such as the body, clothes, and hair, in three minutes. At the core of StdGEN is our proposed Semantic-aware Large Reconstruction Model (S-LRM), a transformer-based generalizable model that jointly reconstructs geometry, color and semantics from multi-view images in a feed-forward manner. A differentiable multi-layer semantic surface extraction scheme is introduced to acquire meshes from hybrid implicit fields reconstructed by our S-LRM. Additionally, a specialized efficient multi-view diffusion model and an iterative multi-layer surface refinement module are integrated into the pipeline to facilitate high-quality, decomposable 3D character generation. Extensive experiments demonstrate our state-of-the-art performance in 3D anime character generation, surpassing existing baselines by a significant margin in geometry, texture and decomposability. StdGEN offers ready-to-use semantic-decomposed 3D characters and enables flexible customization for a wide range of applications. Project page: https://stdgen.github.io

標準生成器：從性解析的單張圖像生成三維角色

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

摘要

Support