TEDRA:基于文本的动态和逼真角色编辑
TEDRA: Text-based Editing of Dynamic and Photoreal Actors
August 28, 2024
作者: Basavaraj Sunagad, Heming Zhu, Mohit Mendiratta, Adam Kortylewski, Christian Theobalt, Marc Habermann
cs.AI
摘要
在过去的几年里,已经取得了显著进展,能够仅通过真实人类的视频创建逼真且可驾驶的3D化身。然而,一个核心的挑战是通过文本描述对服装风格进行精细和用户友好的编辑。为此,我们提出了TEDRA,这是第一种允许基于文本编辑化身的方法,保持化身的高保真度、时空连贯性和动态性,并实现骨骼姿势和视角控制。我们首先训练一个模型,创建一个可控且高保真度的真实演员数字副本。接下来,我们通过在不同摄像机角度捕获的真实角色的各种帧上微调预训练的生成扩散模型,个性化地调整它,以确保数字表示忠实地捕捉真实人物的动态和运动。这两阶段的过程为我们的动态人类化身编辑方法奠定了基础。利用这个个性化的扩散模型,我们使用基于我们的个性化正常对齐得分蒸馏采样(PNA-SDS)的模型引导框架,根据提供的文本提示修改动态化身。此外,我们提出了一种时间步骤退火策略,以确保高质量的编辑。我们的结果表明,在功能性和视觉质量方面,相比先前的工作有明显的改进。
English
Over the past years, significant progress has been made in creating
photorealistic and drivable 3D avatars solely from videos of real humans.
However, a core remaining challenge is the fine-grained and user-friendly
editing of clothing styles by means of textual descriptions. To this end, we
present TEDRA, the first method allowing text-based edits of an avatar, which
maintains the avatar's high fidelity, space-time coherency, as well as
dynamics, and enables skeletal pose and view control. We begin by training a
model to create a controllable and high-fidelity digital replica of the real
actor. Next, we personalize a pretrained generative diffusion model by
fine-tuning it on various frames of the real character captured from different
camera angles, ensuring the digital representation faithfully captures the
dynamics and movements of the real person. This two-stage process lays the
foundation for our approach to dynamic human avatar editing. Utilizing this
personalized diffusion model, we modify the dynamic avatar based on a provided
text prompt using our Personalized Normal Aligned Score Distillation Sampling
(PNA-SDS) within a model-based guidance framework. Additionally, we propose a
time step annealing strategy to ensure high-quality edits. Our results
demonstrate a clear improvement over prior work in functionality and visual
quality.Summary
AI-Generated Summary