TeCH: テキストガイドによるリアルな着衣人体の再構築

要旨

単一画像から衣服を着た人間を再構築する最近の研究進展にもかかわらず、高レベルの詳細を伴う「見えない領域」を正確に復元することは、注目を集めていない未解決の課題として残っています。既存の手法では、しばしば過度に滑らかな背面表面とぼやけたテクスチャが生成されます。しかし、単一画像から個人のすべての視覚的属性を効果的に捉え、見えない領域（例えば、背面ビュー）を再構築するのに十分な情報をどのように取得するのでしょうか？基盤モデルの力を動機として、TeCHは以下の要素を活用して3D人間を再構築します：1）衣服解析モデルと視覚的質問応答（VQA）を介して自動生成される記述的テキストプロンプト（例えば、衣服、色、髪型）、2）「言い表せない」外観を学習するパーソナライズされた微調整済みテキスト-to-画像拡散モデル（T2I）。高解像度の3D衣服を着た人間を低コストで表現するために、明示的な体形状グリッドと暗黙的な距離場からなるDMTetに基づくハイブリッド3D表現を提案します。記述的プロンプトとパーソナライズされたT2I拡散モデルに導かれ、3D人間の形状とテクスチャは、元の観測に基づく多視点スコア蒸留サンプリング（SDS）と再構築損失を通じて最適化されます。TeCHは、一貫性と繊細なテクスチャ、詳細な全身形状を伴う高忠実度の3D衣服を着た人間を生成します。定量的および定性的な実験により、TeCHが再構築精度とレンダリング品質において最先端の手法を上回ることが示されています。コードは研究目的でhttps://huangyangyi.github.io/techで公開されます。

English

Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/tech

TeCH: テキストガイドによるリアルな着衣人体の再構築

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

要旨

Support