TeCH:基於文本引導的逼真著衣人體重建技術
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
August 16, 2023
作者: Yangyi Huang, Hongwei Yi, Yuliang Xiu, Tingting Liao, Jiaxiang Tang, Deng Cai, Justus Thies
cs.AI
摘要
儘管近期在從單張圖像重建著衣人體方面取得研究進展,如何精準還原具有高階細節的「不可見區域」仍是缺乏關注的未解難題。現有方法常生成過度平滑的背部表面與模糊紋理。但如何從單張圖像有效捕捉人體的全部視覺屬性,使其足以重建不可見區域(如背部視角)?受基礎模型能力啟發,TeCH通過以下方式重建3D人體:1)利用服裝解析模型和視覺問答(VQA)自動生成的描述性文本提示(如服裝款式、顏色、髮型);2)個性化微調的文本到圖像擴散模型(T2I),用於學習「難以言述」的外觀特徵。為低成本實現高解析度3D著衣人體建模,我們提出基於DMTet的混合3D表徵,由顯式人體形狀網格與隱式距離場構成。在描述性提示詞與個性化T2I擴散模型引導下,3D人體的幾何結構與紋理通過基於原始觀測數據的多視角分數蒸餾採樣(SDS)及重建損失進行優化。TeCH能生成具有一致性精細紋理、全身幾何細節豐富的高擬真3D著衣人體。定量與定性實驗表明,TeCH在重建精度與渲染質量方面均超越現有頂尖方法。相關代碼將於https://huangyangyi.github.io/tech 公開供研究使用。
English
Despite recent research advancements in reconstructing clothed humans from a
single image, accurately restoring the "unseen regions" with high-level details
remains an unsolved challenge that lacks attention. Existing methods often
generate overly smooth back-side surfaces with a blurry texture. But how to
effectively capture all visual attributes of an individual from a single image,
which are sufficient to reconstruct unseen areas (e.g., the back view)?
Motivated by the power of foundation models, TeCH reconstructs the 3D human by
leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles)
which are automatically generated via a garment parsing model and Visual
Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion
model (T2I) which learns the "indescribable" appearance. To represent
high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D
representation based on DMTet, which consists of an explicit body shape grid
and an implicit distance field. Guided by the descriptive prompts +
personalized T2I diffusion model, the geometry and texture of the 3D humans are
optimized through multi-view Score Distillation Sampling (SDS) and
reconstruction losses based on the original observation. TeCH produces
high-fidelity 3D clothed humans with consistent & delicate texture, and
detailed full-body geometry. Quantitative and qualitative experiments
demonstrate that TeCH outperforms the state-of-the-art methods in terms of
reconstruction accuracy and rendering quality. The code will be publicly
available for research purposes at https://huangyangyi.github.io/tech