物理場景的點雲渲染：從不完美機器人數據實現端到端的真實到模擬轉換

摘要

從真實世界的機器人運動直接創建精確的物理模擬，對於實現安全、可擴展且經濟高效的機器人學習具有重要價值，但這仍然是一項極具挑戰性的任務。真實的機器人數據存在遮擋、噪聲相機姿態和動態場景元素等問題，這些都阻礙了對未見物體進行幾何精確且逼真的數字孿生創建。我們提出了一種新穎的真實到模擬框架，能夠同時應對所有這些挑戰。我們的核心洞察在於一種混合場景表示法，它將3D高斯濺射的逼真渲染與適合物理模擬的顯式物體網格結合在單一表示中。我們提出了一個端到端的優化流程，該流程利用MuJoCo中的可微分渲染和可微分物理，直接從原始且不精確的機器人軌跡中聯合優化所有場景組件——從物體幾何和外觀到機器人姿態及物理參數。這種統一的優化使我們能夠同時實現高保真物體網格重建、生成逼真的新視圖，並進行無標註的機器人姿態校準。我們在模擬中以及使用ALOHA 2雙臂操作器的真實世界複雜序列上展示了我們方法的有效性，從而實現了更實用、更穩健的真實到模擬流程。

English

Creating accurate, physical simulations directly from real-world robot motion holds great value for safe, scalable, and affordable robot learning, yet remains exceptionally challenging. Real robot data suffers from occlusions, noisy camera poses, dynamic scene elements, which hinder the creation of geometrically accurate and photorealistic digital twins of unseen objects. We introduce a novel real-to-sim framework tackling all these challenges at once. Our key insight is a hybrid scene representation merging the photorealistic rendering of 3D Gaussian Splatting with explicit object meshes suitable for physics simulation within a single representation. We propose an end-to-end optimization pipeline that leverages differentiable rendering and differentiable physics within MuJoCo to jointly refine all scene components - from object geometry and appearance to robot poses and physical parameters - directly from raw and imprecise robot trajectories. This unified optimization allows us to simultaneously achieve high-fidelity object mesh reconstruction, generate photorealistic novel views, and perform annotation-free robot pose calibration. We demonstrate the effectiveness of our approach both in simulation and on challenging real-world sequences using an ALOHA 2 bi-manual manipulator, enabling more practical and robust real-to-simulation pipelines.

物理場景的點雲渲染：從不完美機器人數據實現端到端的真實到模擬轉換

Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data

摘要

Support