PhyGenHOI: fysisch-bewuste 4D-generatie van dynamische mens-objectinteracties

Samenvatting

Wij pakken de taak aan om fysiek accurate en visueel getrouwe 4D Mens-Object Interactie (HOI) te genereren. Gegeven een statische 3D-mens en een doelobject, weergegeven als 3D Gaussiaanse Splats (3DGS), is ons doel om dynamische scènes te synthetiseren waarin de mens actief met het object interageert door middel van acties zoals slaan of trappen, overeenkomstig een gegeven invoertekst. Hiertoe introduceren wij PhyGenHOI, een nieuw raamwerk dat generatieve menselijke beweging koppelt aan een expliciete fysieke simulatie van het object. Wij modelleren de mens als een semantische agent die wordt aangestuurd door een Bewegingsdiffusiemodel (MDM) en het object als een fysieke agent die wordt gesimuleerd via de Materiaalpuntmethode (MPM), waarbij 3D Gaussianen worden gebruikt als een uniforme, differentieerbare representatie. Wij superviseren hun interactie via drie gekoppelde mechanismen: (1) een Venster-aantrekkingsverlies dat generatieve beweging temporeel synchroniseert om het object te onderscheppen; (2) een Contactgestuurde Hersimulatiestap die bij impact fysiek consistente momentumoverdracht triggert; en (3) een Gemaskeerde Video-SDS-doelstelling die videogebaseerde prioriën injecteert om de contactgetrouwheid te verbeteren. Experimenten tonen aan dat PhyGenHOI fysiek consistente 4D HOI genereert over diverse acties, mensen en objecten, en daarbij baselines overtreft. Projectpagina en video's: https://omerbenishu.github.io/PhyGenHOI/

English

We address the task of generating physically accurate and visually faithful 4D Human-Object Interaction (HOI). Given a static 3D human and target object represented as 3D Gaussian Splats (3DGS), our goal is to synthesize dynamic scenes where the human actively engages with the object through actions, such as punching or kicking, in accordance with a given input text. To this end, we introduce PhyGenHOI, a novel framework that couples generative human motion with an explicit physical object simulation. We model the human as a semantic agent driven by a Motion Diffusion Model (MDM) and the object as a physical agent simulated via the Material Point Method (MPM), utilizing 3D Gaussians as a unified, differentiable representation. We supervise their interaction through three coupled mechanisms: (1) A Windowed Attraction Loss that temporally synchronizes generative motion to intercept the object; (2) A Contact-Driven Re-simulation step that triggers physically consistent momentum transfer upon impact; and (3) A Masked Video-SDS objective that injects video-based priors to enhance contact fidelity. Experiments show PhyGenHOI generates physically consistent 4D HOI across diverse actions, humans, and objects, outperforming baselines. Project page and videos: https://omerbenishu.github.io/PhyGenHOI/