DreamHOI:基于主体驱动的扩散先验生成3D人体物体交互
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors
September 12, 2024
作者: Thomas Hanwen Zhu, Ruining Li, Tomas Jakab
cs.AI
摘要
我们提出了DreamHOI,这是一种用于零样本合成人-物互动(HOIs)的新方法,使3D人体模型能够根据文本描述与任何给定物体进行逼真互动。这一任务由于现实世界物体的不同类别和几何形状的复杂性,以及包含多样化HOIs的数据集的稀缺性而变得复杂。为了避免对大量数据的需求,我们利用在数十亿图像-标题对上训练的文本到图像扩散模型。我们通过从这些模型中获得的Score Distillation Sampling(SDS)梯度来优化一个经过皮肤处理的人体网格的表达,这些梯度预测图像空间的编辑。然而,直接将图像空间梯度反向传播到复杂的表达参数是无效的,因为这些梯度的局部性质。为了克服这一问题,我们引入了一个皮肤网格的双隐式-显式表示,将(隐式)神经辐射场(NeRFs)与(显式)骨骼驱动的网格表达相结合。在优化过程中,我们在隐式和显式形式之间过渡,使NeRF生成与网格表达精细调节相结合。我们通过大量实验证实了我们的方法,展示了其在生成逼真HOIs方面的有效性。
English
We present DreamHOI, a novel method for zero-shot synthesis of human-object
interactions (HOIs), enabling a 3D human model to realistically interact with
any given object based on a textual description. This task is complicated by
the varying categories and geometries of real-world objects and the scarcity of
datasets encompassing diverse HOIs. To circumvent the need for extensive data,
we leverage text-to-image diffusion models trained on billions of image-caption
pairs. We optimize the articulation of a skinned human mesh using Score
Distillation Sampling (SDS) gradients obtained from these models, which predict
image-space edits. However, directly backpropagating image-space gradients into
complex articulation parameters is ineffective due to the local nature of such
gradients. To overcome this, we introduce a dual implicit-explicit
representation of a skinned mesh, combining (implicit) neural radiance fields
(NeRFs) with (explicit) skeleton-driven mesh articulation. During optimization,
we transition between implicit and explicit forms, grounding the NeRF
generation while refining the mesh articulation. We validate our approach
through extensive experiments, demonstrating its effectiveness in generating
realistic HOIs.Summary
AI-Generated Summary