GALA：從單一掃描生成可動畫分層資產

摘要

我們提出了GALA，一個框架，它以單層穿著的3D人體網格作為輸入，並將其分解為完整的多層3D資產。然後可以將輸出與其他資產結合，創建具有任何姿勢的新穿著人類化身。現有的重建方法通常將穿著的人類視為單層幾何，忽略了具有髮型、服裝和配飾的人類的固有組成性，從而限制了網格對下游應用的效用。將單層網格分解為獨立層是一項具有挑戰性的任務，因為它需要為嚴重遮蔽區域合成合理的幾何和紋理。此外，即使成功分解，網格在姿勢和身體形狀方面也沒有進行規範化，無法與新的身份和姿勢進行一致的組合。為了應對這些挑戰，我們建議利用預先訓練的2D擴散模型的一般知識作為人類和其他資產的幾何和外觀先驗。我們首先使用從多視圖2D分割中提取的3D表面分割來分離輸入網格。然後，我們使用一種新穎的姿勢引導的得分蒸餾抽樣（SDS）損失，在姿勢和規範空間中合成不同層的缺失幾何。一旦我們完成修補高保真度的3D幾何，我們還將相同的SDS損失應用於其紋理，以獲得包括最初遮蔽區域在內的完整外觀。通過一系列的分解步驟，我們在共享的規範空間中獲得了多層3D資產，這些資產在姿勢和人體形狀方面進行了規範化，從而支持對新身份的輕鬆組合和對新姿勢的重新動畫。我們的實驗證明了我們的方法在分解、規範化和組合任勞任怨的效果，相較於現有解決方案。

English

We present GALA, a framework that takes as input a single-layer clothed 3D human mesh and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars with any pose. Existing reconstruction approaches often treat clothed humans as a single-layer of geometry and overlook the inherent compositionality of humans with hairstyles, clothing, and accessories, thereby limiting the utility of the meshes for downstream applications. Decomposing a single-layer mesh into separate layers is a challenging task because it requires the synthesis of plausible geometry and texture for the severely occluded regions. Moreover, even with successful decomposition, meshes are not normalized in terms of poses and body shapes, failing coherent composition with novel identities and poses. To address these challenges, we propose to leverage the general knowledge of a pretrained 2D diffusion model as geometry and appearance prior for humans and other assets. We first separate the input mesh using the 3D surface segmentation extracted from multi-view 2D segmentations. Then we synthesize the missing geometry of different layers in both posed and canonical spaces using a novel pose-guided Score Distillation Sampling (SDS) loss. Once we complete inpainting high-fidelity 3D geometry, we also apply the same SDS loss to its texture to obtain the complete appearance including the initially occluded regions. Through a series of decomposition steps, we obtain multiple layers of 3D assets in a shared canonical space normalized in terms of poses and human shapes, hence supporting effortless composition to novel identities and reanimation with novel poses. Our experiments demonstrate the effectiveness of our approach for decomposition, canonicalization, and composition tasks compared to existing solutions.

GALA：從單一掃描生成可動畫分層資產

GALA: Generating Animatable Layered Assets from a Single Scan

摘要

Support