TeCH：文本引導的逼真服裝人類重建

摘要

儘管最近在從單張圖像重建穿著衣物的研究取得了進展，但精確地還原帶有高級細節的「看不見的區域」仍然是一個缺乏關注且尚未解決的挑戰。現有方法通常會生成過於平滑的背面表面，帶有模糊的紋理。然而，如何有效地從單張圖像中捕捉個人的所有視覺特徵，以便重建看不見的區域（例如背面視圖）呢？受到基礎模型的威力的啟發，TeCH通過以下方式重建3D人體：1）利用描述性文本提示（例如服裝、顏色、髮型），這些提示是通過服裝解析模型和視覺問答（VQA）自動生成的，2）一個個性化微調的文本到圖像擴散模型（T2I），該模型學習了「難以描述」的外觀。為了以負擔得起的成本呈現高分辨率的3D穿著衣物的人體，我們提出了基於DMTet的混合3D表示，其中包括明確的身體形狀網格和隱式距離場。在描述性提示+個性化T2I擴散模型的指導下，通過多視角分數蒸餾採樣（SDS）和基於原始觀察的重建損失，優化了3D人體的幾何和紋理。TeCH生成了具有一致且精細紋理以及詳細全身幾何的高保真度3D穿著衣物的人體。定量和定性實驗表明，TeCH在重建準確性和渲染質量方面優於最先進的方法。代碼將公開提供供研究目的使用，網址為https://huangyangyi.github.io/tech

English

Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/tech

TeCH：文本引導的逼真服裝人類重建

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

摘要

Support