ChatPaper.aiChatPaper

雙臂人形模擬平台DualTHOR:面向突發情況感知的規劃系統

DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning

June 19, 2025
作者: Boyu Li, Siyuan He, Hang Xu, Haoqi Yuan, Yu Zang, Liwei Hu, Junpeng Yue, Zhenxiong Jiang, Pengbo Hu, Börje F. Karlsson, Yehui Tang, Zongqing Lu
cs.AI

摘要

開發能夠在現實世界場景中執行複雜互動任務的具身智能體,仍然是具身人工智能領域的一個基本挑戰。儘管近年來模擬平台的進展極大地提升了訓練具身視覺語言模型(VLMs)的任務多樣性,但大多數平台依賴於簡化的機器人形態,並繞過了低層次執行的隨機性,這限制了它們向現實世界機器人的可轉移性。為解決這些問題,我們基於AI2-THOR的擴展版本,提出了一個面向複雜雙臂人形機器人的物理模擬平台——DualTHOR。我們的模擬器包含了現實世界的機器人資產、一套針對雙臂協作的任務集,以及適用於人形機器人的逆運動學求解器。此外,我們引入了一種應急機制,通過基於物理的低層次執行來模擬潛在的失敗情況,從而縮小與現實世界場景之間的差距。我們的模擬器使得在家庭環境中對VLMs的魯棒性和泛化能力進行更全面的評估成為可能。大量評估結果表明,當前的VLMs在雙臂協調方面存在困難,並在具有應急情況的真實環境中表現出有限的魯棒性,這凸顯了使用我們的模擬器來開發更具能力的VLMs以應對具身任務的重要性。代碼可在https://github.com/ds199895/DualTHOR.git獲取。
English
Developing embodied agents capable of performing complex interactive tasks in real-world scenarios remains a fundamental challenge in embodied AI. Although recent advances in simulation platforms have greatly enhanced task diversity to train embodied Vision Language Models (VLMs), most platforms rely on simplified robot morphologies and bypass the stochastic nature of low-level execution, which limits their transferability to real-world robots. To address these issues, we present a physics-based simulation platform DualTHOR for complex dual-arm humanoid robots, built upon an extended version of AI2-THOR. Our simulator includes real-world robot assets, a task suite for dual-arm collaboration, and inverse kinematics solvers for humanoid robots. We also introduce a contingency mechanism that incorporates potential failures through physics-based low-level execution, bridging the gap to real-world scenarios. Our simulator enables a more comprehensive evaluation of the robustness and generalization of VLMs in household environments. Extensive evaluations reveal that current VLMs struggle with dual-arm coordination and exhibit limited robustness in realistic environments with contingencies, highlighting the importance of using our simulator to develop more capable VLMs for embodied tasks. The code is available at https://github.com/ds199895/DualTHOR.git.
PDF192June 26, 2025