HSImul3R：基於物理迴圈的仿真就緒人場景互動重建

摘要

我們提出HSImul3R——一個從隨意捕捉數據（包括稀疏視角圖像和單目影片）實現人-場景互動三維重建的統一框架。現有方法存在感知-模擬鴻溝：視覺上合理的重建結果常違反物理約束，導致物理引擎不穩定並在具身智能應用中失效。為彌合此鴻溝，我們引入物理基礎的雙向優化流程，將物理模擬器作為主動監督器，聯合優化人體動態與場景幾何。在正向流程中，採用場景導向的強化學習，在動作逼真度與接觸穩定性的雙重監督下優化人體運動。在反向流程中，提出直接模擬獎勵優化，利用重力穩定性與互動成功率的模擬反饋來優化場景幾何。我們進一步推出HSIBench基準數據集，包含多樣化物體與互動場景。大量實驗表明，HSImul3R首次生成穩定且可直接用於模擬的人-場景互動重建結果，並能直接部署至真實世界人形機器人。

English

We present HSImul3R, a unified framework for simulation-ready 3D reconstruction of human-scene interactions (HSI) from casual captures, including sparse-view images and monocular videos. Existing methods suffer from a perception-simulation gap: visually plausible reconstructions often violate physical constraints, leading to instability in physics engines and failure in embodied AI applications. To bridge this gap, we introduce a physically-grounded bi-directional optimization pipeline that treats the physics simulator as an active supervisor to jointly refine human dynamics and scene geometry. In the forward direction, we employ Scene-targeted Reinforcement Learning to optimize human motion under dual supervision of motion fidelity and contact stability. In the reverse direction, we propose Direct Simulation Reward Optimization, which leverages simulation feedback on gravitational stability and interaction success to refine scene geometry. We further present HSIBench, a new benchmark with diverse objects and interaction scenarios. Extensive experiments demonstrate that HSImul3R produces the first stable, simulation-ready HSI reconstructions and can be directly deployed to real-world humanoid robots.

HSImul3R：基於物理迴圈的仿真就緒人場景互動重建

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

摘要

Support