OpenGVL - 視覺時序進程基準測試於資料策展之應用

摘要

數據稀缺性仍然是推動機器人技術進步的主要限制因素之一。然而，現實世界中可用的機器人數據量正在呈指數級增長，為大規模數據利用創造了新的機遇。可靠的時序任務完成預測可以幫助自動化地大規模註釋和整理這些數據。最近提出的生成式價值學習（Generative Value Learning, GVL）方法，利用視覺-語言模型（Vision-Language Models, VLMs）中嵌入的知識，從視覺觀察中預測任務進度。基於GVL，我們提出了OpenGVL，這是一個全面的基準，用於評估涉及機器人和人體操作的多樣化且具挑戰性的任務進度。我們評估了公開可用的開源基礎模型的能力，結果顯示開源模型家族在時序進度預測任務上的表現顯著落後於閉源模型，僅達到其性能的約70%。此外，我們展示了OpenGVL如何作為自動化數據整理和過濾的實用工具，實現對大規模機器人數據集的高效質量評估。我們在github.com/budzianowski/opengvl{OpenGVL}上發布了該基準以及完整的代碼庫。

English

Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately 70% of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at github.com/budzianowski/opengvl{OpenGVL}.

OpenGVL - 視覺時序進程基準測試於資料策展之應用

OpenGVL - Benchmarking Visual Temporal Progress for Data Curation

摘要

Support