RoboMME:機器人通用策略記憶能力基準測試與理解
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies
March 4, 2026
作者: Yinpei Dai, Hongze Fu, Jayjun Lee, Yuejiang Liu, Haoran Zhang, Jianing Yang, Chelsea Finn, Nima Fazeli, Joyce Chai
cs.AI
摘要
記憶能力對於長時程且依賴歷史的機器人操作至關重要。這類任務通常涉及重複動作計數或處理暫時被遮擋的物件。近期視覺-語言-動作模型開始整合記憶機制,但其評估仍侷限於狹窄的非標準化環境,這限制了系統性理解、模型比較與進展衡量。為解決這些挑戰,我們推出RoboMME:一個大規模標準化基準測試平台,用於評估並推進VLA模型在長時程依賴歷史場景中的表現。該基準包含16項操作任務,基於精心設計的分類框架構建,可評估時序記憶、空間記憶、物件記憶與流程記憶。我們進一步開發了14種基於π0.5架構的記憶增強型VLA模型變體,透過多重整合策略系統性探索不同記憶表徵方式。實驗結果表明,記憶表徵的有效性高度依賴具體任務,每種設計在不同任務中均展現出獨特優勢與侷限性。影片與程式碼請參見專案網站:https://robomme.github.io。
English
Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluations remain confined to narrow, non-standardized settings. This limits their systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios. Our benchmark comprises 16 manipulation tasks constructed under a carefully designed taxonomy that evaluates temporal, spatial, object, and procedural memory. We further develop a suite of 14 memory-augmented VLA variants built on the π0.5 backbone to systematically explore different memory representations across multiple integration strategies. Experimental results show that the effectiveness of memory representations is highly task-dependent, with each design offering distinct advantages and limitations across different tasks. Videos and code can be found at our website https://robomme.github.io.