基于文本的生成和编辑复合式3D头像

摘要

我们的目标是仅使用文本描述来创建具有头发和配饰的逼真三维面部化身。尽管这一挑战引起了相当大的关注，但现有方法要么缺乏逼真性，要么产生不真实的形状，要么不支持编辑，例如修改发型。我们认为现有方法受限于采用单一建模方法，使用单一表示来表现头部、面部、头发和配饰。我们观察到，例如头发和面部具有非常不同的结构特性，适合采用不同的表示方法。基于这一观察，我们利用一个组合模型生成化身，其中头部、面部和上半身用传统的三维网格表示，而头发、服装和配饰则用神经辐射场（NeRF）表示。基于模型的网格表示为面部区域提供了强大的几何先验，提高了逼真度，同时使得可以编辑人物外观。通过使用NeRF来表示其余组件，我们的方法能够对具有复杂几何和外观的部分进行建模和合成，例如卷曲的头发和蓬松的围巾。我们的新系统可以根据文本描述合成这些高质量的组合化身。实验结果表明，我们的方法，即文本引导的组合化身生成与编辑（TECA），产生的化身比最近的方法更逼真，同时由于其组合性质而具有可编辑性。例如，我们的TECA实现了化身之间组合特征（如发型、围巾和其他配饰）的无缝转移。这种能力支持虚拟试穿等应用。

English

Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest, existing methods either lack realism, produce unrealistic shapes, or do not support editing, such as modifications to the hairstyle. We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories. Our observation is that the hair and face, for example, have very different structural qualities that benefit from different representations. Building on this insight, we generate avatars with a compositional model, in which the head, face, and upper body are represented with traditional 3D meshes, and the hair, clothing, and accessories with neural radiance fields (NeRF). The model-based mesh representation provides a strong geometric prior for the face region, improving realism while enabling editing of the person's appearance. By using NeRFs to represent the remaining components, our method is able to model and synthesize parts with complex geometry and appearance, such as curly hair and fluffy scarves. Our novel system synthesizes these high-quality compositional avatars from text descriptions. The experimental results demonstrate that our method, Text-guided generation and Editing of Compositional Avatars (TECA), produces avatars that are more realistic than those of recent methods while being editable because of their compositional nature. For example, our TECA enables the seamless transfer of compositional features like hairstyles, scarves, and other accessories between avatars. This capability supports applications such as virtual try-on.

基于文本的生成和编辑复合式3D头像

Text-Guided Generation and Editing of Compositional 3D Avatars

摘要

Support