基於文本的合成式3D頭像生成與編輯

摘要

我們的目標是僅透過文字描述創建具有頭髮和配飾的逼真3D面部化身。儘管這一挑戰近來引起了相當大的關注，但現有方法要麼缺乏逼真感，要麼產生不現實的形狀，或者不支持編輯，例如修改髮型。我們認為現有方法存在局限性，因為它們採用單一建模方法，使用單一表示來呈現頭部、面部、頭髮和配飾。我們的觀察是，例如頭髮和面部具有非常不同的結構特性，可以從不同的表示中受益。基於這一洞察，我們使用組合模型生成化身，其中頭部、面部和上半身用傳統的3D網格表示，而頭髮、服裝和配飾則使用神經輻射場（NeRF）。基於模型的網格表示為面部區域提供了強大的幾何先驗，提高了逼真度，同時實現了對人物外觀的編輯。通過使用NeRF來表示其餘組件，我們的方法能夠對具有復雜幾何和外觀的部分進行建模和合成，例如捲曲的頭髮和蓬鬆的圍巾。我們的新穎系統從文字描述中合成這些高質量的組合化身。實驗結果表明，我們的方法，即基於文本引導的組合化身生成和編輯（TECA），生成的化身比最近的方法更加逼真，同時由於其組合性質而可編輯。例如，我們的TECA實現了化身之間組合特徵（如髮型、圍巾和其他配飾）的無縫轉移。這種能力支持虛擬試穿等應用。

English

Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest, existing methods either lack realism, produce unrealistic shapes, or do not support editing, such as modifications to the hairstyle. We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories. Our observation is that the hair and face, for example, have very different structural qualities that benefit from different representations. Building on this insight, we generate avatars with a compositional model, in which the head, face, and upper body are represented with traditional 3D meshes, and the hair, clothing, and accessories with neural radiance fields (NeRF). The model-based mesh representation provides a strong geometric prior for the face region, improving realism while enabling editing of the person's appearance. By using NeRFs to represent the remaining components, our method is able to model and synthesize parts with complex geometry and appearance, such as curly hair and fluffy scarves. Our novel system synthesizes these high-quality compositional avatars from text descriptions. The experimental results demonstrate that our method, Text-guided generation and Editing of Compositional Avatars (TECA), produces avatars that are more realistic than those of recent methods while being editable because of their compositional nature. For example, our TECA enables the seamless transfer of compositional features like hairstyles, scarves, and other accessories between avatars. This capability supports applications such as virtual try-on.

基於文本的合成式3D頭像生成與編輯

Text-Guided Generation and Editing of Compositional 3D Avatars

摘要

Support