EnerVerse-AC:以行動條件構想具身環境
EnerVerse-AC: Envisioning Embodied Environments with Action Condition
May 14, 2025
作者: Yuxin Jiang, Shengcong Chen, Siyuan Huang, Liliang Chen, Pengfei Zhou, Yue Liao, Xindong He, Chiming Liu, Hongsheng Li, Maoqing Yao, Guanghui Ren
cs.AI
摘要
機器人模仿學習已從解決靜態任務發展到應對動態互動場景,但由於需要與動態環境進行實時互動,測試與評估仍然成本高昂且具有挑戰性。我們提出了EnerVerse-AC(EVAC),這是一種基於動作條件生成未來視覺觀測的世界模型,能夠實現真實且可控的機器人推理。在先前架構的基礎上,EVAC引入了多層次動作條件機制和射線圖編碼,用於動態多視角圖像生成,同時通過擴展多樣化失敗軌跡的訓練數據來提升泛化能力。作為數據引擎和評估工具,EVAC將人類收集的軌跡擴展為多樣化數據集,並生成真實的、基於動作條件的視頻觀測用於策略測試,從而消除了對物理機器人或複雜模擬的需求。這一方法在保持機器人操作評估高保真度的同時,顯著降低了成本。大量實驗驗證了我們方法的有效性。代碼、檢查點和數據集可在<https://annaj2178.github.io/EnerverseAC.github.io>找到。
English
Robotic imitation learning has advanced from solving static tasks to
addressing dynamic interaction scenarios, but testing and evaluation remain
costly and challenging due to the need for real-time interaction with dynamic
environments. We propose EnerVerse-AC (EVAC), an action-conditional world model
that generates future visual observations based on an agent's predicted
actions, enabling realistic and controllable robotic inference. Building on
prior architectures, EVAC introduces a multi-level action-conditioning
mechanism and ray map encoding for dynamic multi-view image generation while
expanding training data with diverse failure trajectories to improve
generalization. As both a data engine and evaluator, EVAC augments
human-collected trajectories into diverse datasets and generates realistic,
action-conditioned video observations for policy testing, eliminating the need
for physical robots or complex simulations. This approach significantly reduces
costs while maintaining high fidelity in robotic manipulation evaluation.
Extensive experiments validate the effectiveness of our method. Code,
checkpoints, and datasets can be found at
<https://annaj2178.github.io/EnerverseAC.github.io>.Summary
AI-Generated Summary