Kinema4D: Kinematische 4D Wereldmodellering voor Spatiotemporele Belichaamde Simulatie

Samenvatting

Het simuleren van robot-wereldinteracties is een hoeksteen van Embodied AI. Recentelijk hebben enkele werken belofte getoond door het gebruik van videogeneraties om de rigide visuele/fysieke beperkingen van traditionele simulatoren te overstijgen. Echter, zij opereren voornamelijk in 2D-ruimte of worden geleid door statische omgevingssignalen, waarbij de fundamentele realiteit wordt genegeerd dat robot-wereldinteracties inherente 4D ruimte-temporele gebeurtenissen zijn die precieze interactieve modellering vereisen. Om dit 4D-essentie te herstellen en tegelijkertijd precieze robotcontrole te garanderen, introduceren wij Kinema4D, een nieuwe actie-gestuurde 4D generatieve robotica-simulator die de robot-wereldinteractie ontwart in: i) Precieze 4D-representatie van robotcontroles: wij besturen een URDF-gebaseerde 3D-robot via kinematica, wat een precies 4D robotcontroletraject produceert. ii) Generatieve 4D-modellering van omgevingsreacties: wij projecteren het 4D-robottraject in een pointmap als een ruimte-temporeel visueel signaal, waarbij het generatieve model wordt aangestuurd om de reactieve dynamiek van complexe omgevingen te synthetiseren in gesynchroniseerde RGB/pointmap-reeksen. Om de training te faciliteren, hebben wij een grootschalige dataset samengesteld genaamd Robo4D-200k, bestaande uit 201.426 robotinteractie-episodes met hoogwaardige 4D-annotaties. Uitgebreide experimenten tonen aan dat onze methode effectief fysiek plausibele, geometrisch consistente en embodiment-agnostische interacties simuleert die diverse real-world dynamieken getrouw weerspiegelen. Voor het eerst toont het potentieel voor zero-shot transfercapaciteit, wat een hoogwaardige basis biedt voor de ontwikkeling van next-generation embodied simulatie.

English

Simulating robot-world interactions is a cornerstone of Embodied AI. Recently, a few works have shown promise in leveraging video generations to transcend the rigid visual/physical constraints of traditional simulators. However, they primarily operate in 2D space or are guided by static environmental cues, ignoring the fundamental reality that robot-world interactions are inherently 4D spatiotemporal events that require precise interactive modeling. To restore this 4D essence while ensuring the precise robot control, we introduce Kinema4D, a new action-conditioned 4D generative robotic simulator that disentangles the robot-world interaction into: i) Precise 4D representation of robot controls: we drive a URDF-based 3D robot via kinematics, producing a precise 4D robot control trajectory. ii) Generative 4D modeling of environmental reactions: we project the 4D robot trajectory into a pointmap as a spatiotemporal visual signal, controlling the generative model to synthesize complex environments' reactive dynamics into synchronized RGB/pointmap sequences. To facilitate training, we curated a large-scale dataset called Robo4D-200k, comprising 201,426 robot interaction episodes with high-quality 4D annotations. Extensive experiments demonstrate that our method effectively simulates physically-plausible, geometry-consistent, and embodiment-agnostic interactions that faithfully mirror diverse real-world dynamics. For the first time, it shows potential zero-shot transfer capability, providing a high-fidelity foundation for advancing next-generation embodied simulation.

Kinema4D: Kinematische 4D Wereldmodellering voor Spatiotemporele Belichaamde Simulatie

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Samenvatting

Support