HSImul3R：基于物理循环的仿真就绪人-场景交互重建

摘要

我们提出HSImul3R，一个面向仿真应用的三维人-场景交互重建统一框架，支持从稀疏视角图像和单目视频等非专业采集数据中实现即用型重建。现有方法存在感知与仿真的脱节：视觉上合理的重建结果常违反物理约束，导致物理引擎中的不稳定现象及具身智能应用失效。为弥合这一差距，我们引入基于物理的双向优化流程，将物理仿真器作为主动监督器，联合优化人体动力学与场景几何。在正向过程中，采用场景导向强化学习技术，在运动保真度与接触稳定性的双重监督下优化人体运动。在逆向过程中，提出直接仿真奖励优化方法，利用重力稳定性和交互成功率的仿真反馈来优化场景几何。我们还推出HSIBench新基准数据集，涵盖多样化物体与交互场景。大量实验表明，HSImul3R首次实现了稳定可仿真的人-场景交互重建，并能直接部署于真实人形机器人平台。

English

We present HSImul3R, a unified framework for simulation-ready 3D reconstruction of human-scene interactions (HSI) from casual captures, including sparse-view images and monocular videos. Existing methods suffer from a perception-simulation gap: visually plausible reconstructions often violate physical constraints, leading to instability in physics engines and failure in embodied AI applications. To bridge this gap, we introduce a physically-grounded bi-directional optimization pipeline that treats the physics simulator as an active supervisor to jointly refine human dynamics and scene geometry. In the forward direction, we employ Scene-targeted Reinforcement Learning to optimize human motion under dual supervision of motion fidelity and contact stability. In the reverse direction, we propose Direct Simulation Reward Optimization, which leverages simulation feedback on gravitational stability and interaction success to refine scene geometry. We further present HSIBench, a new benchmark with diverse objects and interaction scenarios. Extensive experiments demonstrate that HSImul3R produces the first stable, simulation-ready HSI reconstructions and can be directly deployed to real-world humanoid robots.