PhyGenHOI: 物理感知的4D动态人-物交互生成

摘要

我们研究了生成物理精确且视觉逼真的4D人-物交互（HOI）任务。给定一个静态3D人体和以3D高斯泼溅（3DGS）表示的目标物体，我们的目标是合成动态场景，其中人体根据给定的输入文本主动与物体进行交互（例如击打或踢踹）。为此，我们提出了PhyGenHOI，一个新颖的框架，将生成式人体运动与显式物理物体模拟耦合。我们将人体建模为由运动扩散模型（MDM）驱动的语义智能体，将物体建模为通过物质点法（MPM）模拟的物理智能体，并利用3D高斯作为统一的、可微分的表示。我们通过三种耦合机制监督它们的交互：（1）窗口吸引损失，在时间上同步生成式运动以拦截物体；（2）接触驱动重模拟步骤，在碰撞时触发物理一致性的动量传递；（3）掩膜视频SDS目标，注入基于视频的先验以增强接触保真度。实验表明，PhyGenHOI能够在不同动作、人体和物体上生成物理一致的4D HOI，并优于基线方法。项目页面及视频：https://omerbenishu.github.io/PhyGenHOI/

English

We address the task of generating physically accurate and visually faithful 4D Human-Object Interaction (HOI). Given a static 3D human and target object represented as 3D Gaussian Splats (3DGS), our goal is to synthesize dynamic scenes where the human actively engages with the object through actions, such as punching or kicking, in accordance with a given input text. To this end, we introduce PhyGenHOI, a novel framework that couples generative human motion with an explicit physical object simulation. We model the human as a semantic agent driven by a Motion Diffusion Model (MDM) and the object as a physical agent simulated via the Material Point Method (MPM), utilizing 3D Gaussians as a unified, differentiable representation. We supervise their interaction through three coupled mechanisms: (1) A Windowed Attraction Loss that temporally synchronizes generative motion to intercept the object; (2) A Contact-Driven Re-simulation step that triggers physically consistent momentum transfer upon impact; and (3) A Masked Video-SDS objective that injects video-based priors to enhance contact fidelity. Experiments show PhyGenHOI generates physically consistent 4D HOI across diverse actions, humans, and objects, outperforming baselines. Project page and videos: https://omerbenishu.github.io/PhyGenHOI/